Text File Parsing with Python

Question

I am trying to parse a series of text files and save them as CSV files using Python  2 7 3   All text files have a 4 line long header which needs to be stripped out  The data lines have various delimiters including    quote   -  dash     column  and blank space  I found it a pain to code it in C   with all these different delimiters  so I decided to try it in Python hearing it is relatively easier to do compared to C C     I wrote a piece of code to test it for a single line of data and it works  however  I could not manage to make it work for the actual file  For parsing a single line I was using the text object and  replace  method  It looks like my current implementation reads the text file as a list  and there is no replace method for the list object   Being a novice in Python  I got stuck at this point  Any input would be appreciated   Thanks     function for parsing the data def data parser text  dic   for i  j in dic iteritems        text   text replace i j  return text    open input output files  inputfile   open  test dat   outputfile   open  test csv    w    my text   inputfile readlines   4    reads to whole text file  skipping first 4 lines     sample text string  just for demonstration to let you know how the data looks like   my text     2012-06-23 03 09 13 23  4323584 -1 911224 -0 4657288 -0 1166382 -0 24823 0 256485  NAN  -0 3489428 -0 130449 -0 2440527 -0 2942413 0 04944348 0 4337797 -1 105218 -1 201882 -0 5962594 -0 586636     dictionary definition 0-  1- etc  are there to parse the date block delimited with dashes  and make sure the negative numbers are not effected reps      NAN    NAN            0-   0    1-   1    2-   2    3-   3    4-   4    5-   5    6-   6    7-   7    8-   8    9-   9                        txt   data parser my text  reps  outputfile writelines txt   inputfile close   outputfile close

User · Accepted Answer

I would use a for loop to iterate over the lines in the text file   for line in my text      outputfile writelines data parser line  reps     If you want to read the file line-by-line instead of loading the whole thing at the start of the script you could do something like this   inputfile   open  test dat   outputfile   open  test csv    w      sample text string  just for demonstration to let you know how the data looks like   my text     2012-06-23 03 09 13 23  4323584 -1 911224 -0 4657288 -0 1166382 -0 24823 0 256485  NAN  -0 3489428 -0 130449 -0 2440527 -0 2942413 0 04944348 0 4337797 -1 105218 -1 201882 -0 5962594 -0 586636     dictionary definition 0-  1- etc  are there to parse the date block delimited with dashes  and make sure the negative numbers are not effected reps      NAN    NAN            0-   0    1-   1    2-   2    3-   3    4-   4    5-   5    6-   6    7-   7    8-   8    9-   9                        for i in range 4   inputfile next     skip first four lines for line in inputfile      outputfile writelines data parser line  reps    inputfile close   outputfile close

User · Answer

From the accepted answer  it looks like your desired behaviour is to turn  skip 0 skip 1 skip 2 skip 3  2012-06-23 03 09 13 23  4323584 -1 911224 -0 4657288 -0 1166382 -0 24823 0 256485  NAN  -0 3489428 -0 130449 -0 2440527 -0 2942413 0 04944348 0 4337797 -1 105218 -1 201882 -0 5962594 -0 586636   into  2012 06 23 03 09 13 23 4323584 -1 911224 -0 4657288 -0 1166382 -0 24823 0 256485 NAN -0 3489428 -0 130449 -0 2440527 -0 2942413 0 04944348 0 4337797 -1 105218 -1 201882 -0 5962594 -0 586636   If that s right  then I think something like  import csv  with open  test dat    rb   as infile  open  test csv    wb   as outfile      reader   csv reader infile      writer   csv writer outfile  quoting False      for i  line in enumerate reader           if i  lt  4  continue         date   line 0  split           day   date 0  split  -           time   date 1  split              newline   day   time   line 1           writer writerow newline    would be a little simpler than the reps stuff

User · Answer

There are a few ways to go about this   One option would be to use inputfile read   instead of inputfile readlines   - you d need to write separate code to strip the first four lines  but if you want the final output as a single string anyway  this might make the most sense   A second  simpler option would be to rejoin the strings after striping the first four lines with my text      join my text    This is a little inefficient  but if speed isn t a major concern  the code will be simplest   Finally  if you actually want the output as a list of strings instead of a single string  you can just modify your data parser to iterate over the list   That might looks something like this   def data parser lines  dic       for i  j in dic iteritems            for  k  line  in enumerate lines               lines k    line replace i  j      return lines

[python] Text File Parsing with Python

Examples related to python

Examples related to parsing

Examples related to text

Examples related to file-io

Examples related to python-2.7