[python] Python import csv to list

I have a CSV file with about 2000 records.

Each record has a string, and a category to it:

This is the first line,Line1
This is the second line,Line2
This is the third line,Line3

I need to read this file into a list that looks like this:

data = [('This is the first line', 'Line1'),
        ('This is the second line', 'Line2'),
        ('This is the third line', 'Line3')]

How can import this CSV to the list I need using Python?

This question is related to python csv

The answer is


Using the csv module:

import csv

with open('file.csv', newline='') as f:
    reader = csv.reader(f)
    data = list(reader)

print(data)

Output:

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

If you need tuples:

import csv

with open('file.csv', newline='') as f:
    reader = csv.reader(f)
    data = [tuple(row) for row in reader]

print(data)

Output:

[('This is the first line', 'Line1'), ('This is the second line', 'Line2'), ('This is the third line', 'Line3')]

Old Python 2 answer, also using the csv module:

import csv
with open('file.csv', 'rb') as f:
    reader = csv.reader(f)
    your_list = list(reader)

print your_list
# [['This is the first line', 'Line1'],
#  ['This is the second line', 'Line2'],
#  ['This is the third line', 'Line3']]

Updated for Python 3:

import csv

with open('file.csv', newline='') as f:
    reader = csv.reader(f)
    your_list = list(reader)

print(your_list)

Output:

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

Pandas is pretty good at dealing with data. Here is one example how to use it:

import pandas as pd

# Read the CSV into a pandas data frame (df)
#   With a df you can do many things
#   most important: visualize data with Seaborn
df = pd.read_csv('filename.csv', delimiter=',')

# Or export it in many ways, e.g. a list of tuples
tuples = [tuple(x) for x in df.values]

# or export it as a list of dicts
dicts = df.to_dict().values()

One big advantage is that pandas deals automatically with header rows.

If you haven't heard of Seaborn, I recommend having a look at it.

See also: How do I read and write CSV files with Python?

Pandas #2

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
dicts = df.to_dict('records')

The content of df is:

     country   population population_time    EUR
0    Germany   82521653.0      2016-12-01   True
1     France   66991000.0      2017-01-01   True
2  Indonesia  255461700.0      2017-01-01  False
3    Ireland    4761865.0             NaT   True
4      Spain   46549045.0      2017-06-01   True
5    Vatican          NaN             NaT   True

The content of dicts is

[{'country': 'Germany', 'population': 82521653.0, 'population_time': Timestamp('2016-12-01 00:00:00'), 'EUR': True},
 {'country': 'France', 'population': 66991000.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': True},
 {'country': 'Indonesia', 'population': 255461700.0, 'population_time': Timestamp('2017-01-01 00:00:00'), 'EUR': False},
 {'country': 'Ireland', 'population': 4761865.0, 'population_time': NaT, 'EUR': True},
 {'country': 'Spain', 'population': 46549045.0, 'population_time': Timestamp('2017-06-01 00:00:00'), 'EUR': True},
 {'country': 'Vatican', 'population': nan, 'population_time': NaT, 'EUR': True}]

Pandas #3

import pandas as pd

# Get data - reading the CSV file
import mpu.pd
df = mpu.pd.example_df()

# Convert
lists = [[row[col] for col in df.columns] for row in df.to_dict('records')]

The content of lists is:

[['Germany', 82521653.0, Timestamp('2016-12-01 00:00:00'), True],
 ['France', 66991000.0, Timestamp('2017-01-01 00:00:00'), True],
 ['Indonesia', 255461700.0, Timestamp('2017-01-01 00:00:00'), False],
 ['Ireland', 4761865.0, NaT, True],
 ['Spain', 46549045.0, Timestamp('2017-06-01 00:00:00'), True],
 ['Vatican', nan, NaT, True]]

Update for Python3:

import csv
from pprint import pprint

with open('text.csv', newline='') as file:
    reader = csv.reader(file)
    res = list(map(tuple, reader))

pprint(res)

Output:

[('This is the first line', ' Line1'),
 ('This is the second line', ' Line2'),
 ('This is the third line', ' Line3')]

If csvfile is a file object, it should be opened with newline=''.
csv module


If you are sure there are no commas in your input, other than to separate the category, you can read the file line by line and split on ,, then push the result to List

That said, it looks like you are looking at a CSV file, so you might consider using the modules for it


result = []
for line in text.splitlines():
    result.append(tuple(line.split(",")))

As said already in the comments you can use the csv library in python. csv means comma separated values which seems exactly your case: a label and a value separated by a comma.

Being a category and value type I would rather use a dictionary type instead of a list of tuples.

Anyway in the code below I show both ways: d is the dictionary and l is the list of tuples.

import csv

file_name = "test.txt"
try:
    csvfile = open(file_name, 'rt')
except:
    print("File not found")
csvReader = csv.reader(csvfile, delimiter=",")
d = dict()
l =  list()
for row in csvReader:
    d[row[1]] = row[0]
    l.append((row[0], row[1]))
print(d)
print(l)

A simple loop would suffice:

lines = []
with open('test.txt', 'r') as f:
    for line in f.readlines():
        l,name = line.strip().split(',')
        lines.append((l,name))

print lines

Unfortunately I find none of the existing answers particularly satisfying.

Here is a straightforward and complete Python 3 solution, using the csv module.

import csv

with open('../resources/temp_in.csv', newline='') as f:
    reader = csv.reader(f, skipinitialspace=True)
    rows = list(reader)

print(rows)

Notice the skipinitialspace=True argument. This is necessary since, unfortunately, OP's CSV contains whitespace after each comma.

Output:

[['This is the first line', 'Line1'], ['This is the second line', 'Line2'], ['This is the third line', 'Line3']]

You can use the list() function to convert csv reader object to list

import csv

with open('input.csv') as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    rows = list(reader)
    print(rows)

Here is the easiest way in Python 3.x to import a CSV to a multidimensional array, and its only 4 lines of code without importing anything!

#pull a CSV into a multidimensional array in 4 lines!

L=[]                            #Create an empty list for the main array
for line in open('log.txt'):    #Open the file and read all the lines
    x=line.rstrip()             #Strip the \n from each line
    L.append(x.split(','))      #Split each line into a list and add it to the
                                #Multidimensional array
print(L)

Extending your requirements a bit and assuming you do not care about the order of lines and want to get them grouped under categories, the following solution may work for you:

>>> fname = "lines.txt"
>>> from collections import defaultdict
>>> dct = defaultdict(list)
>>> with open(fname) as f:
...     for line in f:
...         text, cat = line.rstrip("\n").split(",", 1)
...         dct[cat].append(text)
...
>>> dct
defaultdict(<type 'list'>, {' CatA': ['This is the first line', 'This is the another line'], ' CatC': ['This is the third line'], ' CatB': ['This is the second line', 'This is the last line']})

This way you get all relevant lines available in the dictionary under key being the category.


Next is a piece of code which uses csv module but extracts file.csv contents to a list of dicts using the first line which is a header of csv table

import csv
def csv2dicts(filename):
  with open(filename, 'rb') as f:
    reader = csv.reader(f)
    lines = list(reader)
    if len(lines) < 2: return None
    names = lines[0]
    if len(names) < 1: return None
    dicts = []
    for values in lines[1:]:
      if len(values) != len(names): return None
      d = {}
      for i,_ in enumerate(names):
        d[names[i]] = values[i]
      dicts.append(d)
    return dicts
  return None

if __name__ == '__main__':
  your_list = csv2dicts('file.csv')
  print your_list

Questions with python tag:

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation Upgrade to python 3.8 using conda Unable to allocate array with shape and data type How to fix error "ERROR: Command errored out with exit status 1: python." when trying to install django-heroku using pip How to prevent Google Colab from disconnecting? "UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure." when plotting figure with pyplot on Pycharm How to fix 'Object arrays cannot be loaded when allow_pickle=False' for imdb.load_data() function? "E: Unable to locate package python-pip" on Ubuntu 18.04 Tensorflow 2.0 - AttributeError: module 'tensorflow' has no attribute 'Session' Jupyter Notebook not saving: '_xsrf' argument missing from post How to Install pip for python 3.7 on Ubuntu 18? Python: 'ModuleNotFoundError' when trying to import module from imported package OpenCV TypeError: Expected cv::UMat for argument 'src' - What is this? Requests (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.") Error in PyCharm requesting website How to setup virtual environment for Python in VS Code? Pylint "unresolved import" error in Visual Studio Code Pandas Merging 101 Numpy, multiply array with scalar What is the meaning of "Failed building wheel for X" in pip install? Selenium: WebDriverException:Chrome failed to start: crashed as google-chrome is no longer running so ChromeDriver is assuming that Chrome has crashed Could not install packages due to an EnvironmentError: [Errno 13] OpenCV !_src.empty() in function 'cvtColor' error ConvergenceWarning: Liblinear failed to converge, increase the number of iterations How to downgrade python from 3.7 to 3.6 I can't install pyaudio on Windows? How to solve "error: Microsoft Visual C++ 14.0 is required."? Iterating over arrays in Python 3 How do I install opencv using pip? How do I install Python packages in Google's Colab? How do I use TensorFlow GPU? How to upgrade Python version to 3.7? How to resolve TypeError: can only concatenate str (not "int") to str How can I install a previous version of Python 3 in macOS using homebrew? Flask at first run: Do not use the development server in a production environment TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array What is the difference between Jupyter Notebook and JupyterLab? Pytesseract : "TesseractNotFound Error: tesseract is not installed or it's not in your path", how do I fix this? Could not install packages due to a "Environment error :[error 13]: permission denied : 'usr/local/bin/f2py'" How do I resolve a TesseractNotFoundError? Trying to merge 2 dataframes but get ValueError Authentication plugin 'caching_sha2_password' is not supported Python Pandas User Warning: Sorting because non-concatenation axis is not aligned

Questions with csv tag:

Pandas: ValueError: cannot convert float NaN to integer Export result set on Dbeaver to CSV Convert txt to csv python script How to import an Excel file into SQL Server? "CSV file does not exist" for a filename with embedded quotes Save Dataframe to csv directly to s3 Python Data-frame Object has no Attribute (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape How to write to a CSV line by line? How to check encoding of a CSV file Excel: macro to export worksheet as CSV file without leaving my current Excel sheet How to get rid of "Unnamed: 0" column in a pandas DataFrame? Key error when selecting columns in pandas dataframe after read_csv Use python requests to download CSV ValueError: not enough values to unpack (expected 11, got 1) TypeError: a bytes-like object is required, not 'str' in python and CSV How to add header row to a pandas DataFrame TypeError: list indices must be integers or slices, not str Pandas read_csv from url Write single CSV file using spark-csv Encoding Error in Panda read_csv Java - Writing strings to a CSV file Deleting rows with Python in a CSV file How to convert dataframe into time series? Load CSV file with Spark IndexError: too many indices for array How do I skip a header from CSV files in Spark? Error in file(file, "rt") : cannot open the connection Printing column separated by comma using Awk command line What does the error "arguments imply differing number of rows: x, y" mean? Read specific columns with pandas or other python module How do I read a large csv file with pandas? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign Adding double quote delimiters into csv file Writing .csv files from C++ Python import csv to list What does the "More Columns than Column Names" error mean? Adding a newline character within a cell (CSV) Python Pandas: How to read only first n rows of CSV files in? Maximum number of rows of CSV data in excel sheet Parsing a CSV file using NodeJS Download a file from HTTPS using download.file() Create Pandas DataFrame from a string Calculate summary statistics of columns in dataframe datetime dtypes in pandas read_csv Import multiple csv files into pandas and concatenate into one DataFrame How to avoid Python/Pandas creating an index in a saved csv? How to split CSV files as per number of rows specified? Skip rows during csv import pandas Batch file to split .csv file