Anyone know of a simple library or function to parse a csv encoded string and turn it into an array or dictionary?
I don't think I want the built in csv module because in all the examples I've seen that takes filepaths, not strings.
You can convert a string to a file object using io.StringIO
and then pass that to the csv
module:
from io import StringIO
import csv
scsv = """text,with,Polish,non-Latin,letters
1,2,3,4,5,6
a,b,c,d,e,f
ges,zólty,waz,idzie,waska,drózka,
"""
f = StringIO(scsv)
reader = csv.reader(f, delimiter=',')
for row in reader:
print('\t'.join(row))
simpler version with split()
on newlines:
reader = csv.reader(scsv.split('\n'), delimiter=',')
for row in reader:
print('\t'.join(row))
Or you can simply split()
this string into lines using \n
as separator, and then split()
each line into values, but this way you must be aware of quoting, so using csv
module is preferred.
On Python 2 you have to import StringIO
as
from StringIO import StringIO
instead.
Simple - the csv module works with lists, too:
>>> a=["1,2,3","4,5,6"] # or a = "1,2,3\n4,5,6".split('\n')
>>> import csv
>>> x = csv.reader(a)
>>> list(x)
[['1', '2', '3'], ['4', '5', '6']]
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just turn your string into a single element list.
Importing StringIO seems a bit excessive to me when this example is explicitly in the docs.
Here's an alternative solution:
>>> import pyexcel as pe
>>> text="""1,2,3
... a,b,c
... d,e,f"""
>>> s = pe.load_from_memory('csv', text)
>>> s
Sheet Name: csv
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| a | b | c |
+---+---+---+
| d | e | f |
+---+---+---+
>>> s.to_array()
[[u'1', u'2', u'3'], [u'a', u'b', u'c'], [u'd', u'e', u'f']]
Here's the documentation
>>> a = "1,2"
>>> a
'1,2'
>>> b = a.split(",")
>>> b
['1', '2']
To parse a CSV file:
f = open(file.csv, "r")
lines = f.read().split("\n") # "\r\n" if needed
for line in lines:
if line != "": # add other needed checks to skip titles
cols = line.split(",")
print cols
As others have already pointed out, Python includes a module to read and write CSV files. It works pretty well as long as the input characters stay within ASCII limits. In case you want to process other encodings, more work is needed.
The Python documentation for the csv module implements an extension of csv.reader, which uses the same interface but can handle other encodings and returns unicode strings. Just copy and paste the code from the documentation. After that, you can process a CSV file like this:
with open("some.csv", "rb") as csvFile:
for row in UnicodeReader(csvFile, encoding="iso-8859-15"):
print row
Panda is quite powerful and smart library reading CSV in Python
A simple example here, I have example.zip file with four files in it.
EXAMPLE.zip
-- example1.csv
-- example1.txt
-- example2.csv
-- example2.txt
from zipfile import ZipFile
import pandas as pd
filepath = 'EXAMPLE.zip'
file_prefix = filepath[:-4].lower()
zipfile = ZipFile(filepath)
target_file = ''.join([file_prefix, '/', file_prefix, 1 , '.csv'])
df = pd.read_csv(zipfile.open(target_file))
print(df.head()) # print first five row of csv
print(df[COL_NAME]) # fetch the col_name data
Once you have data you can manipulate to play with a list or other formats.
Use this to have a csv loaded into a list
import csv
csvfile = open(myfile, 'r')
reader = csv.reader(csvfile, delimiter='\t')
my_list = list(reader)
print my_list
>>>[['1st_line', '0'],
['2nd_line', '0']]
https://docs.python.org/2/library/csv.html?highlight=csv#csv.reader
csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called
Thus, a StringIO.StringIO()
, str.splitlines()
or even a generator are all good.
The official doc for csv.reader()
https://docs.python.org/2/library/csv.html is very helpful, which says
file objects and list objects are both suitable
import csv
text = """1,2,3
a,b,c
d,e,f"""
lines = text.splitlines()
reader = csv.reader(lines, delimiter=',')
for row in reader:
print('\t'.join(row))
Source: Stackoverflow.com