Load text file as strings using numpy loadtxt

Question

I would like to load a big text file  around 1 nbsp GB with 3 10 6 rows and 10 - 100 columns  as a 2D np-array containing strings  However  it seems like numpy loadtxt   only takes floats as default  Is it possible to specify another data type for the entire array  I ve tried the following without luck   loadedData   np loadtxt address  dtype np str    I get the following error message    Library Python 2 7 site-packages numpy-1 8 0 dev 20224ea 20121123-py2 7-macosx-10 8-x86 64 egg numpy lib npyio pyc in loadtxt fname  dtype  comments  delimiter  converters  skiprows  usecols  unpack  ndmin      833             fh close       834 -- gt  835     X   np array X  dtype      836       Multicolumn data are returned with shape  1  N  M   i e      837        1  1  M  for a single row - remove the singleton dimension there  ValueError  cannot set an array element with a sequence   Any ideas   I don t know the exact number of columns in my file on beforehand

User · Answer

There is also read csv in Pandas  which is fast and supports non-comma column separators and automatic typing by column   import pandas as pd df   pd read csv  your file  sep   t     It can be converted to a NumPy array if you prefer that type with   import numpy as np arr   np array df    This is by far the easiest and most mature text import approach I ve come across

User · Answer

Use genfromtxt instead  It s a much more general method than loadtxt   import numpy as np print np genfromtxt  col txt  dtype  str     Using the file col txt   foo bar cat dog man wine   This gives      foo   bar      cat   dog      man   wine      If you expect that each row has the same number of columns  read the first row and set the attribute filling values to fix any missing rows

User · Answer

Is it essential that you need a NumPy array  Otherwise you could speed things up by loading the data as a nested list   def load fname           Load the file using std open        f   open fname  r        data          for line in f readlines            data append line replace   n      split            f close        return data   For a text file with 4000x4000 words this is about 10 times faster than loadtxt

[python] Load text file as strings using numpy.loadtxt()

Examples related to python

Examples related to numpy