[python] Parsing CSV / tab-delimited txt file with Python

I currently have a CSV file which, when opened in Excel, has a total of 5 columns. Only columns A and C are of any significance to me and the data in the remaining columns is irrelevant.

Starting on line 8 and then working in multiples of 7 (ie. lines 8, 15, 22, 29, 36 etc...), I am looking to create a dictionary with Python 2.7 with the information from these fields. The data in column A will be the key (a 6-digit integer) and the data in column C being the respective value for the key. I've tried to highlight this below but the formatting isn't the best:-

    A        B      C          D
1                           CDCDCDCD  
2                           VDDBDDB
3
4
5
6
7  DDEFEEF                   FEFEFEFE
8  123456         JONES
9
10
11
12
13
14
15 293849         SMITH

As per the above, I am looking to extract the value from A7 (DDEFEEF) as a key in my dictionary and "FEFEFEFE" being the respective data and then add another entry to my dictionary, jumping to line 15 with "2938495" being my key and "Smith" being the respective value.

Any suggestions? The source file is a .txt file with entries being tab-delimited. Thanks

Clarification:

Just to clarify, so far, I have tried the below:-

import csv

mydict = {:}
f = open("myfile", 'rt')
reader = csv.reader(f)
    for row in reader:
        print row

The above simply prints out all content though a row at a time. I did try "for row(7) in reader" but this returned an error. I then researched it and had a go at the below but it didn't work neither:

import csv
from itertools import islice

entries = csv.reader(open("myfile", 'rb'))
mydict = {'key' : 'value'}

for i in xrange(6):
    mydict['i(0)] = 'I(2)    # integers representing columns
    range = islice(entries,6)
    for entry in range:
        mydict[entries(0) = entries(2)] # integers representing columns

This question is related to python parsing csv dictionary

The answer is


Although there is nothing wrong with the other solutions presented, you could simplify and greatly escalate your solutions by using python's excellent library pandas.

Pandas is a library for handling data in Python, preferred by many Data Scientists.

Pandas has a simplified CSV interface to read and parse files, that can be used to return a list of dictionaries, each containing a single line of the file. The keys will be the column names, and the values will be the ones in each cell.

In your case:

    import pandas

    def create_dictionary(filename):
        my_data = pandas.DataFrame.from_csv(filename, sep='\t', index_col=False)
        # Here you can delete the dataframe columns you don't want!
        del my_data['B']
        del my_data['D']
        # ...
        # Now you transform the DataFrame to a list of dictionaries
        list_of_dicts = [item for item in my_data.T.to_dict().values()]
        return list_of_dicts

# Usage:
x = create_dictionary("myfile.csv")

If the file is large, you may not want to load it entirely into memory at once. This approach avoids that. (Of course, making a dict out of it could still take up some RAM, but it's guaranteed to be smaller than the original file.)

my_dict = {}
for i, line in enumerate(file):
    if (i - 8) % 7:
        continue
    k, v = line.split("\t")[:3:2]
    my_dict[k] = v

Edit: Not sure where I got extend from before. I meant update


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to parsing

Got a NumberFormatException while trying to parse a text file for objects Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) Python/Json:Expecting property name enclosed in double quotes Correctly Parsing JSON in Swift 3 How to get response as String using retrofit without using GSON or any other library in android UIButton action in table view cell "Expected BEGIN_OBJECT but was STRING at line 1 column 1" How to convert an XML file to nice pandas dataframe? How to extract multiple JSON objects from one file? How to sum digits of an integer in java?

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file

Examples related to dictionary

JS map return object python JSON object must be str, bytes or bytearray, not 'dict Python update a key in dict if it doesn't exist How to update the value of a key in a dictionary in Python? How to map an array of objects in React C# Dictionary get item by index Are dictionaries ordered in Python 3.6+? Split / Explode a column of dictionaries into separate columns with pandas Writing a dictionary to a text file? enumerate() for dictionary in python