[python] Text File Parsing with Python

I am trying to parse a series of text files and save them as CSV files using Python (2.7.3). All text files have a 4 line long header which needs to be stripped out. The data lines have various delimiters including " (quote), - (dash), : column, and blank space. I found it a pain to code it in C++ with all these different delimiters, so I decided to try it in Python hearing it is relatively easier to do compared to C/C++.

I wrote a piece of code to test it for a single line of data and it works, however, I could not manage to make it work for the actual file. For parsing a single line I was using the text object and "replace" method. It looks like my current implementation reads the text file as a list, and there is no replace method for the list object.

Being a novice in Python, I got stuck at this point. Any input would be appreciated!

Thanks!

# function for parsing the data
def data_parser(text, dic):
for i, j in dic.iteritems():
    text = text.replace(i,j)
return text

# open input/output files

inputfile = open('test.dat')
outputfile = open('test.csv', 'w')

my_text = inputfile.readlines()[4:] #reads to whole text file, skipping first 4 lines


# sample text string, just for demonstration to let you know how the data looks like
# my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'

# dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected
reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' }

txt = data_parser(my_text, reps)
outputfile.writelines(txt)

inputfile.close()
outputfile.close()

This question is related to python parsing text file-io python-2.7

The answer is


I would use a for loop to iterate over the lines in the text file:

for line in my_text:
    outputfile.writelines(data_parser(line, reps))

If you want to read the file line-by-line instead of loading the whole thing at the start of the script you could do something like this:

inputfile = open('test.dat')
outputfile = open('test.csv', 'w')

# sample text string, just for demonstration to let you know how the data looks like
# my_text = '"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636'

# dictionary definition 0-, 1- etc. are there to parse the date block delimited with dashes, and make sure the negative numbers are not effected
reps = {'"NAN"':'NAN', '"':'', '0-':'0,','1-':'1,','2-':'2,','3-':'3,','4-':'4,','5-':'5,','6-':'6,','7-':'7,','8-':'8,','9-':'9,', ' ':',', ':':',' }

for i in range(4): inputfile.next() # skip first four lines
for line in inputfile:
    outputfile.writelines(data_parser(line, reps))

inputfile.close()
outputfile.close()

From the accepted answer, it looks like your desired behaviour is to turn

skip 0
skip 1
skip 2
skip 3
"2012-06-23 03:09:13.23",4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,"NAN",-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636

into

2012,06,23,03,09,13.23,4323584,-1.911224,-0.4657288,-0.1166382,-0.24823,0.256485,NAN,-0.3489428,-0.130449,-0.2440527,-0.2942413,0.04944348,0.4337797,-1.105218,-1.201882,-0.5962594,-0.586636

If that's right, then I think something like

import csv

with open("test.dat", "rb") as infile, open("test.csv", "wb") as outfile:
    reader = csv.reader(infile)
    writer = csv.writer(outfile, quoting=False)
    for i, line in enumerate(reader):
        if i < 4: continue
        date = line[0].split()
        day = date[0].split('-')
        time = date[1].split(':')
        newline = day + time + line[1:]
        writer.writerow(newline)

would be a little simpler than the reps stuff.


There are a few ways to go about this. One option would be to use inputfile.read() instead of inputfile.readlines() - you'd need to write separate code to strip the first four lines, but if you want the final output as a single string anyway, this might make the most sense.

A second, simpler option would be to rejoin the strings after striping the first four lines with my_text = ''.join(my_text). This is a little inefficient, but if speed isn't a major concern, the code will be simplest.

Finally, if you actually want the output as a list of strings instead of a single string, you can just modify your data parser to iterate over the list. That might looks something like this:

def data_parser(lines, dic):
    for i, j in dic.iteritems():
        for (k, line) in enumerate(lines):
            lines[k] = line.replace(i, j)
    return lines

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to parsing

Got a NumberFormatException while trying to parse a text file for objects Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) Python/Json:Expecting property name enclosed in double quotes Correctly Parsing JSON in Swift 3 How to get response as String using retrofit without using GSON or any other library in android UIButton action in table view cell "Expected BEGIN_OBJECT but was STRING at line 1 column 1" How to convert an XML file to nice pandas dataframe? How to extract multiple JSON objects from one file? How to sum digits of an integer in java?

Examples related to text

Difference between opening a file in binary vs text How do I center text vertically and horizontally in Flutter? How to `wget` a list of URLs in a text file? Convert txt to csv python script Reading local text file into a JavaScript array Python: How to increase/reduce the fontsize of x and y tick labels? How can I insert a line break into a <Text> component in React Native? How to split large text file in windows? Copy text from nano editor to shell Atom menu is missing. How do I re-enable

Examples related to file-io

Python, Pandas : write content of DataFrame into text File Saving response from Requests to file How to while loop until the end of a file in Python without checking for empty line? Getting "java.nio.file.AccessDeniedException" when trying to write to a folder How do I add a resources folder to my Java project in Eclipse Read and write a String from text file Python Pandas: How to read only first n rows of CSV files in? Open files in 'rt' and 'wt' modes How to write to a file without overwriting current contents? Write objects into file with Node.js

Examples related to python-2.7

Numpy, multiply array with scalar Not able to install Python packages [SSL: TLSV1_ALERT_PROTOCOL_VERSION] How to create a new text file using Python Could not find a version that satisfies the requirement tensorflow Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Display/Print one column from a DataFrame of Series in Pandas How to calculate 1st and 3rd quartiles? How can I read pdf in python? How to completely uninstall python 2.7.13 on Ubuntu 16.04 Check key exist in python dict