[python] Python csv string to array

Anyone know of a simple library or function to parse a csv encoded string and turn it into an array or dictionary?

I don't think I want the built in csv module because in all the examples I've seen that takes filepaths, not strings.

This question is related to python string arrays csv

The answer is


You can convert a string to a file object using io.StringIO and then pass that to the csv module:

from io import StringIO
import csv

scsv = """text,with,Polish,non-Latin,letters
1,2,3,4,5,6
a,b,c,d,e,f
ges,zólty,waz,idzie,waska,drózka,
"""

f = StringIO(scsv)
reader = csv.reader(f, delimiter=',')
for row in reader:
    print('\t'.join(row))

simpler version with split() on newlines:

reader = csv.reader(scsv.split('\n'), delimiter=',')
for row in reader:
    print('\t'.join(row))

Or you can simply split() this string into lines using \n as separator, and then split() each line into values, but this way you must be aware of quoting, so using csv module is preferred.

On Python 2 you have to import StringIO as

from StringIO import StringIO

instead.


Simple - the csv module works with lists, too:

>>> a=["1,2,3","4,5,6"]  # or a = "1,2,3\n4,5,6".split('\n')
>>> import csv
>>> x = csv.reader(a)
>>> list(x)
[['1', '2', '3'], ['4', '5', '6']]

Per the documentation:

And while the module doesn’t directly support parsing strings, it can easily be done:

import csv
for row in csv.reader(['one,two,three']):
    print row

Just turn your string into a single element list.

Importing StringIO seems a bit excessive to me when this example is explicitly in the docs.


Here's an alternative solution:

>>> import pyexcel as pe
>>> text="""1,2,3
... a,b,c
... d,e,f"""
>>> s = pe.load_from_memory('csv', text)
>>> s
Sheet Name: csv
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| a | b | c |
+---+---+---+
| d | e | f |
+---+---+---+
>>> s.to_array()
[[u'1', u'2', u'3'], [u'a', u'b', u'c'], [u'd', u'e', u'f']]

Here's the documentation


>>> a = "1,2"
>>> a
'1,2'
>>> b = a.split(",")
>>> b
['1', '2']

To parse a CSV file:

f = open(file.csv, "r")
lines = f.read().split("\n") # "\r\n" if needed

for line in lines:
    if line != "": # add other needed checks to skip titles
        cols = line.split(",")
        print cols

As others have already pointed out, Python includes a module to read and write CSV files. It works pretty well as long as the input characters stay within ASCII limits. In case you want to process other encodings, more work is needed.

The Python documentation for the csv module implements an extension of csv.reader, which uses the same interface but can handle other encodings and returns unicode strings. Just copy and paste the code from the documentation. After that, you can process a CSV file like this:

with open("some.csv", "rb") as csvFile: 
    for row in UnicodeReader(csvFile, encoding="iso-8859-15"):
        print row

Panda is quite powerful and smart library reading CSV in Python

A simple example here, I have example.zip file with four files in it.

EXAMPLE.zip
 -- example1.csv
 -- example1.txt
 -- example2.csv
 -- example2.txt

from zipfile import ZipFile
import pandas as pd


filepath = 'EXAMPLE.zip'
file_prefix = filepath[:-4].lower()

zipfile = ZipFile(filepath)
target_file = ''.join([file_prefix, '/', file_prefix, 1 , '.csv'])

df = pd.read_csv(zipfile.open(target_file))

print(df.head()) # print first five row of csv
print(df[COL_NAME]) # fetch the col_name data

Once you have data you can manipulate to play with a list or other formats.


Use this to have a csv loaded into a list

import csv

csvfile = open(myfile, 'r')
reader = csv.reader(csvfile, delimiter='\t')
my_list = list(reader)
print my_list
>>>[['1st_line', '0'],
    ['2nd_line', '0']]

https://docs.python.org/2/library/csv.html?highlight=csv#csv.reader

csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called

Thus, a StringIO.StringIO(), str.splitlines() or even a generator are all good.


The official doc for csv.reader() https://docs.python.org/2/library/csv.html is very helpful, which says

file objects and list objects are both suitable

import csv

text = """1,2,3
a,b,c
d,e,f"""

lines = text.splitlines()
reader = csv.reader(lines, delimiter=',')
for row in reader:
    print('\t'.join(row))

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to arrays

PHP array value passes to next row Use NSInteger as array index How do I show a message in the foreach loop? Objects are not valid as a React child. If you meant to render a collection of children, use an array instead Iterating over arrays in Python 3 Best way to "push" into C# array Sort Array of object by object field in Angular 6 Checking for duplicate strings in JavaScript array what does numpy ndarray shape do? How to round a numpy array?

Examples related to csv

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file