How do I read CSV data into a record array in NumPy

Question

I wonder if there is a direct way to import the contents of a CSV file into a record array  much in the way that R s read table    read delim    and read csv   family imports data to R s data frame   Or is the best way to use csv reader   and then apply something like numpy core records fromrecords

User · Answer

I timed the  from numpy import genfromtxt genfromtxt fname   dest file  dtype     lt whatever options gt      versus  import csv import numpy as np with open dest file  r   as dest f      data iter   csv reader dest f                             delimiter   delimiter                             quotechar            data    data for data in data iter  data array   np asarray data  dtype    lt whatever options gt     on 4 6 million rows with about 70 columns and found that the NumPy path took 2 min 16 secs and the csv-list comprehension method took 13 seconds   I would recommend the csv-list comprehension method as it is most likely relies on pre-compiled libraries and not the interpreter as much as NumPy  I suspect the pandas method would have similar interpreter overhead

User · Answer

This is the easiest way  import csv with open  testfile csv   newline     as csvfile      data   list csv reader csvfile    Now each entry in data is a record  represented as an array  So you have a 2D array  It saved me so much time

User · Answer

As I tried both ways using NumPy and Pandas  using pandas has a lot of advantages    Faster Less CPU usage 1 3 RAM usage compared to NumPy genfromtxt   This is my test code     for f in test pandas py test numpy csv py   do   usr bin time python  f  done 2 94user 0 41system 0 03 05elapsed 109 CPU  0avgtext 0avgdata 502068maxresident k 0inputs 24outputs  0major 107147minor pagefaults 0swaps  23 29user 0 72system 0 23 72elapsed 101 CPU  0avgtext 0avgdata 1680888maxresident k 0inputs 0outputs  0major 416145minor pagefaults 0swaps   test numpy csv py  from numpy import genfromtxt train   genfromtxt   home hvn me notebook train csv   delimiter        test pandas py  from pandas import read csv df   read csv   home hvn me notebook train csv     Data file   du -h   me notebook train csv  59M     home hvn me notebook train csv   With NumPy and pandas at versions     pip freeze   egrep -i  pandas numpy  numpy  1 13 3 pandas  0 20 2

User · Answer

You can also try recfromcsv   which can guess data types and return a properly formatted record array

User · Answer

Using numpy loadtxt  A quite simple method  But it requires all the elements being float  int and so on   import numpy as np  data   np loadtxt  c   1 csv  delimiter     skiprows 0

User · Answer

In  329    time my data   genfromtxt  one csv   delimiter      CPU times  user 19 8 s  sys  4 58 s  total  24 4 s Wall time  24 4 s  In  330    time df   pd read csv  quot one csv quot   skiprows 20  CPU times  user 1 06 s  sys  312 ms  total  1 38 s Wall time  1 38 s

User · Answer

This work as a charm     import csv with open  data csv    r   as f      data   list csv reader f  delimiter        import numpy as np data   np array data  dtype np float

User · Answer

I tried this   import pandas as p import numpy as n  closingValue   p read csv   lt FILENAME gt    usecols  4   dtype float  print closingValue

User · Answer

I would recommend the read csv function from the pandas library   import pandas as pd df pd read csv  myfile csv   sep     header None  df values array    1     2     3              4     5 5   6        This gives a pandas DataFrame - allowing many useful data manipulation functions which are not directly available with numpy record arrays      DataFrame is a 2-dimensional labeled data structure with columns of   potentially different types  You can think of it like a spreadsheet or   SQL table        I would also recommend genfromtxt  However  since the question asks for a record array  as opposed to a normal array  the dtype None parameter needs to be added to the genfromtxt call   Given an input file  myfile csv   1 0  2  3 4  5 5  6  import numpy as np np genfromtxt  myfile csv  delimiter        gives an array   array    1     2     3              4     5 5   6        and   np genfromtxt  myfile csv  delimiter     dtype None    gives a record array   array   1 0  2 0  3    4 0  5 5  6           dtype    f0     lt f8      f1     lt f8      f2     lt i4       This has the advantage that file with multiple data types  including strings  can be easily imported

User · Answer

I would suggest using tables  pip3 install tables   You can save your  csv file to  h5 using pandas  pip3 install pandas    import pandas as pd data   pd read csv  dataset csv   store   pd HDFStore  dataset h5   store  mydata     data store close     You can then easily  and with less time even for huge amount of data  load your data in a NumPy array   import pandas as pd store   pd HDFStore  dataset h5   data   store  mydata   store close      Data in NumPy format data   data values

User · Answer

You can use Numpy s genfromtxt   method to do so  by setting the delimiter kwarg to a comma   from numpy import genfromtxt my data   genfromtxt  my file csv   delimiter        More information on the function can be found at its respective documentation

User · Answer

You can use this code to send CSV file data into an array   import numpy as np csv   np genfromtxt  test csv   delimiter      print csv

[python] How do I read CSV data into a record array in NumPy?

Examples related to python

Examples related to numpy

Examples related to scipy

Examples related to genfromtxt