I am in the process of trying to write a python script that will take input from a CSV file and then push it into a dictionary format (I am using Python 3.x).
I use the code below to read in the CSV file and that works:
import csv
reader = csv.reader(open('C:\\Users\\Chris\\Desktop\\test.csv'), delimiter=',', quotechar='|')
for row in reader:
print(', '.join(row))
But now I want to place the results into a dictionary. I would like the first row of the CSV file to be used as the "key" field for the dictionary with the subsequent rows in the CSV file filling out the data portion.
Sample Data:
Date First Name Last Name Score
12/28/2012 15:15 John Smith 20
12/29/2012 15:15 Alex Jones 38
12/30/2012 15:15 Michael Carpenter 25
There are additional things I would like to do with this code but for now just getting the dictionary to work is what I am looking for.
Can anyone help me with this?
EDITED Version 2:
import csv
reader = csv.DictReader(open('C:\\Users\\Chris\\Desktop\\test.csv'))
result = {}
for row in reader:
for column, value in row.items():
result.setdefault(column, []).append(value)
print('Column -> ', column, '\nValue -> ', value)
print(result)
fieldnames = result.keys()
csvwriter = csv.DictWriter(open('C:\\Users\\Chris\\Desktop\\test_out.csv', 'w'), delimiter=',', fieldnames=result.keys())
csvwriter.writerow(dict((fn,fn) for fn in fieldnames))
for row in result.items():
print('Values -> ', row)
#csvwriter.writerow(row)
'''
Test output
'''
test_array = []
test_array.append({'fruit': 'apple', 'quantity': 5, 'color': 'red'});
test_array.append({'fruit': 'pear', 'quantity': 8, 'color': 'green'});
test_array.append({'fruit': 'banana', 'quantity': 3, 'color': 'yellow'});
test_array.append({'fruit': 'orange', 'quantity': 11, 'color': 'orange'});
fieldnames = ['fruit', 'quantity', 'color']
test_file = open('C:\\Users\\Chris\\Desktop\\test_out.csv','w')
csvwriter = csv.DictWriter(test_file, delimiter=',', fieldnames=fieldnames)
csvwriter.writerow(dict((fn,fn) for fn in fieldnames))
for row in test_array:
print(row)
csvwriter.writerow(row)
test_file.close()
This question is related to
python
csv
dictionary
If you have:
Do this:
mydict = {y[0]: y[1] for y in [x.split(",") for x in open('file.csv').read().split('\n') if x]}
It uses list comprehension to split lines and the last "if x" is used to ignore blank line (usually at the end) which is then unpacked into a dict using dictionary comprehension.
One-liner solution
import pandas as pd
dict = {row[0] : row[1] for _, row in pd.read_csv("file.csv").iterrows()}
Many solutions have been posted and I'd like to contribute with mine, which works for a different number of columns in the CSV file. It creates a dictionary with one key per column, and the value for each key is a list with the elements in such column.
input_file = csv.DictReader(open(path_to_csv_file))
csv_dict = {elem: [] for elem in input_file.fieldnames}
for row in input_file:
for key in csv_dict.keys():
csv_dict[key].append(row[key])
You have to just convert csv.reader to dict:
~ >> cat > 1.csv
key1, value1
key2, value2
key2, value22
key3, value3
~ >> cat > d.py
import csv
with open('1.csv') as f:
d = dict(filter(None, csv.reader(f)))
print(d)
~ >> python d.py
{'key3': ' value3', 'key2': ' value22', 'key1': ' value1'}
Assuming you have a CSV of this structure:
"a","b"
1,2
3,4
5,6
And you want the output to be:
[{'a': '1', ' "b"': '2'}, {'a': '3', ' "b"': '4'}, {'a': '5', ' "b"': '6'}]
A zip function (not yet mentioned) is simple and quite helpful.
def read_csv(filename):
with open(filename) as f:
file_data=csv.reader(f)
headers=next(file_data)
return [dict(zip(headers,i)) for i in file_data]
You can use this, it is pretty cool:
import dataconverters.commas as commas
filename = 'test.csv'
with open(filename) as f:
records, metadata = commas.parse(f)
for row in records:
print 'this is row in dictionary:'+rowenter code here
If you are OK with using the numpy package, then you can do something like the following:
import numpy as np
lines = np.genfromtxt("coors.csv", delimiter=",", dtype=None)
my_dict = dict()
for i in range(len(lines)):
my_dict[lines[i][0]] = lines[i][1]
For simple csv files, such as the following
id,col1,col2,col3
row1,r1c1,r1c2,r1c3
row2,r2c1,r2c2,r2c3
row3,r3c1,r3c2,r3c3
row4,r4c1,r4c2,r4c3
You can convert it to a Python dictionary using only built-ins
with open(csv_file) as f:
csv_list = [[val.strip() for val in r.split(",")] for r in f.readlines()]
(_, *header), *data = csv_list
csv_dict = {}
for row in data:
key, *values = row
csv_dict[key] = {key: value for key, value in zip(header, values)}
This should yield the following dictionary
{'row1': {'col1': 'r1c1', 'col2': 'r1c2', 'col3': 'r1c3'},
'row2': {'col1': 'r2c1', 'col2': 'r2c2', 'col3': 'r2c3'},
'row3': {'col1': 'r3c1', 'col2': 'r3c2', 'col3': 'r3c3'},
'row4': {'col1': 'r4c1', 'col2': 'r4c2', 'col3': 'r4c3'}}
Note: Python dictionaries have unique keys, so if your csv file has duplicate ids
you should append each row to a list.
for row in data:
key, *values = row
if key not in csv_dict:
csv_dict[key] = []
csv_dict[key].append({key: value for key, value in zip(header, values)})
with pandas, it is much easier, for example.
assuming you have the following data as CSV and let's call it test.txt
/ test.csv
(you know CSV is a sort of text file )
a,b,c,d
1,2,3,4
5,6,7,8
now using pandas
import pandas as pd
df = pd.read_csv("./text.txt")
df_to_doct = df.to_dict()
for each row, it would be
df.to_dict(orient='records')
and that's it.
You need a Python DictReader class. More help can be found from here
import csv
with open('file_name.csv', 'rt') as f:
reader = csv.DictReader(f)
for row in reader:
print row
import csv
reader = csv.reader(open('filename.csv', 'r'))
d = {}
for row in reader:
k, v = row
d[k] = v
I believe the syntax you were looking for is as follows:
import csv
with open('coors.csv', mode='r') as infile:
reader = csv.reader(infile)
with open('coors_new.csv', mode='w') as outfile:
writer = csv.writer(outfile)
mydict = {rows[0]:rows[1] for rows in reader}
Alternately, for python <= 2.7.1, you want:
mydict = dict((rows[0],rows[1]) for rows in reader)
This isn't elegant but a one line solution using pandas.
import pandas as pd
pd.read_csv('coors.csv', header=None, index_col=0, squeeze=True).to_dict()
If you want to specify dtype for your index (it can't be specified in read_csv if you use the index_col argument because of a bug):
import pandas as pd
pd.read_csv('coors.csv', header=None, dtype={0: str}).set_index(0).squeeze().to_dict()
Help from @phil-frost was very helpful, was exactly what I was looking for.
I have made few tweaks after that so I'm would like to share it here:
def csv_as_dict(file, ref_header, delimiter=None):
import csv
if not delimiter:
delimiter = ';'
reader = csv.DictReader(open(file), delimiter=delimiter)
result = {}
for row in reader:
print(row)
key = row.pop(ref_header)
if key in result:
# implement your duplicate row handling here
pass
result[key] = row
return result
You can call it:
myvar = csv_as_dict(csv_file, 'ref_column')
Where ref_colum will be your main key for each row.
You can also use numpy for this.
from numpy import loadtxt
key_value = loadtxt("filename.csv", delimiter=",")
mydict = { k:v for k,v in key_value }
I'd suggest adding if rows
in case there is an empty line at the end of the file
import csv
with open('coors.csv', mode='r') as infile:
reader = csv.reader(infile)
with open('coors_new.csv', mode='w') as outfile:
writer = csv.writer(outfile)
mydict = dict(row[:2] for row in reader if row)
Try to use a defaultdict
and DictReader
.
import csv
from collections import defaultdict
my_dict = defaultdict(list)
with open('filename.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
for key, value in line.items():
my_dict[key].append(value)
It returns:
{'key1':[value_1, value_2, value_3], 'key2': [value_a, value_b, value_c], 'Key3':[value_x, Value_y, Value_z]}
Open the file by calling open and then using csv.DictReader.
input_file = csv.DictReader(open("coors.csv"))
You may iterate over the rows of the csv file dict reader object by iterating over input_file.
for row in input_file:
print(row)
OR To access first line only
dictobj = csv.DictReader(open('coors.csv')).next()
UPDATE In python 3+ versions, this code would change a little:
reader = csv.DictReader(open('coors.csv'))
dictobj = next(reader)
Source: Stackoverflow.com