Read specific columns from a csv file with csv module

Question

I m trying to parse through a csv file and extract the data from only specific columns   Example csv   ID   Name   Address   City   State   Zip   Phone   OPEID   IPEDS   10   C      130 W     Mo     AL      3     334     01023   10063     I m trying to capture only specific columns  say ID  Name  Zip and Phone   Code I ve looked at has led me to believe I can call the specific column by its corresponding number  so ie  Name would correspond to 2 and iterating through each row using row 2  would produce all the items in column 2  Only it doesn t   Here s what I ve done so far   import sys  argparse  csv from settings import      command arguments parser   argparse ArgumentParser description  csv to postgres     fromfile prefix chars       parser add argument  file   help  csv file to import   action  store   args   parser parse args   csv file   args file    open csv file with open csv file   rb   as csvfile         get number of columns     for line in csvfile readlines            array   line split              first item   array 0       num columns   len array      csvfile seek 0       reader   csv reader csvfile  delimiter              included cols    1  2  6  7       for row in reader              content   list row i  for i in included cols              print content   and I m expecting that this will print out only the specific columns I want for each row except it doesn t  I get the last column only

User · Answer

You can use numpy loadtext filename   For example if this is your database  csv   ID   Name   Address   City   State   Zip   Phone   OPEID   IPEDS   10   Adam   130 W     Mo     AL      3     334     01023   10063   10   Carl   130 W     Mo     AL      3     334     01023   10063   10   Adolf   130 W     Mo     AL      3     334     01023   10063   10   Den   130 W     Mo     AL      3     334     01023   10063     And you want the Name column   import numpy as np  b np loadtxt r filepath name csv  dtype str delimiter     skiprows 1 usecols  1      gt  gt  gt  b array    Adam      Carl      Adolf      Den            dtype   S7     More easily you  can use genfromtext   b   np genfromtxt r filepath name csv   delimiter      names True dtype None   gt  gt  gt  b  Name   array    Adam      Carl      Adolf      Den            dtype   S7

User · Answer

If you need to process the columns separately  I like to destructure the columns with the zip  iterable  pattern  effectively  unzip    So for your example   ids  names  zips  phones   zip       row 1   row 2   row 6   row 7     for row in reader

User · Answer

import csv from collections import defaultdict  columns   defaultdict list    each value in each column is appended to a list  with open  file txt   as f      reader   csv DictReader f    read rows into a dictionary format     for row in reader    read a row as  column1  value1  column2  value2              for  k v  in row items      go over each column name and value              columns k  append v    append the value into the appropriate list                                    based on column name k  print columns  name    print columns  phone    print columns  street      With a file like  name phone street Bob 0893 32 Silly James 000 400 McHilly Smithers 4442 23 Looped St    Will output   gt  gt  gt     Bob    James    Smithers     0893    000    4442     32 Silly    400 McHilly    23 Looped St      Or alternatively if you want numerical indexing for the columns   with open  file txt   as f      reader   csv reader f      reader next       for row in reader          for  i v  in enumerate row               columns i  append v  print columns 0     gt  gt  gt     Bob    James    Smithers     To change the deliminator add delimiter     to the appropriate instantiation  i e reader   csv reader f delimiter

User · Answer

The only way you would be getting the last column from this code is if you don t include your print statement in your for loop    This is most likely the end of your code   for row in reader      content   list row i  for i in included cols  print content   You want it to be this   for row in reader          content   list row i  for i in included cols          print content   Now that we have covered your mistake  I would like to take this time to introduce you to the pandas module   Pandas is spectacular for dealing with csv files  and the following code would be all you need to read a csv and save an entire column into a variable   import pandas as pd df   pd read csv csv file  saved column   df column name  you can also use df  column name     so if you wanted to save all of the info in your column Names into a variable  this is all you need to do   names   df Names   It s a great module and I suggest you look into it  If for some reason your print statement was in for loop and it was still only printing out the last column  which shouldn t happen  but let me know if my assumption was wrong  Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where  Hope this was helpful

User · Answer

To fetch column name  instead of using readlines   better use readline   to avoid loop  amp  reading the complete file  amp  storing it in the array   with open csv file   rb   as csvfile         get number of columns      line   csvfile readline        first item   line split

User · Answer

import pandas as pd  csv file   pd read csv  file csv    column val list   csv file column name  ndarray values

User · Answer

Context  For this type of work you should use the amazing python petl library  That will save you a lot of work and potential frustration from doing things  manually  with the standard csv module  AFAIK  the only people who still use the csv module are those who have not yet discovered better tools for working with tabular data  pandas  petl  etc    which is fine  but if you plan to work with a lot of data in your career from various strange sources  learning something like petl is one of the best investments you can make  To get started should only take 30 minutes after you ve done pip install petl  The documentation is excellent   Answer  Let s say you have the first table in a csv file  you can also load directly from the database using petl   Then you would simply load it and do the following   from petl import fromcsv  look  cut  tocsv    Load the table table1   fromcsv  table1 csv     Alter the colums table2   cut table1   Song Name   Artist ID    have a quick look to make sure things are ok  Prints a nicely formatted table to your console print look table2    Save to new file tocsv table2   new csv

User · Answer

Use pandas   import pandas as pd my csv   pd read csv filename  column   my csv column name   you can also use my csv  column name     Discard unneeded columns at parse time   my filtered csv   pd read csv filename  usecols   col1    col3    col7      P S  I m just aggregating what other s have said in a simple manner  Actual answers are taken from here and here

User · Answer

Thanks to the way you can index and subset a pandas dataframe  a very easy way to extract a single column from a csv file into a variable is   myVar   pd read csv  YourPath   sep         ColumnName       A few things to consider   The snippet above will produce a pandas Series and not dataframe  The suggestion from ayhan with usecols will also be faster if speed is an issue  Testing the two different approaches using  timeit on a 2122 KB sized csv file yields 22 8 ms for the usecols approach and 53 ms for my suggested approach   And don t forget import pandas as pd

User · Answer

I think there is an easier way      import pandas as pd  dataset   pd read csv  table1 csv   ftCol   dataset iloc    0  values   So in here iloc    0     means all values  0 means the position of the column  in the example below ID will be selected  ID   Name   Address   City   State   Zip   Phone   OPEID   IPEDS   10   C      130 W     Mo     AL      3     334     01023   10063

User · Answer

SAMPLE CSV a  1    b  2  - c  3    d  4    column names     quot Letter quot    quot Number quot    quot Symbol quot   df   pd read csv  quot sample csv quot   names column names  print df  OUTPUT   Letter  Number Symbol 0      a       1        1      b       2      - 2      c       3        3      d       4         letters   df Letter to list   print letters  OUTPUT   a    b    c    d

User · Answer

With pandas you can use read csv with usecols parameter   df   pd read csv filename  usecols   col1    col3    col7      Example   import pandas as pd import io  s       total bill tip sex smoker day time size 16 99 1 01 Female No Sun Dinner 2 10 34 1 66 Male No Sun Dinner 3 21 01 3 5 Male No Sun Dinner 3      df   pd read csv io StringIO s   usecols   total bill    day    size    print df      total bill  day  size 0       16 99  Sun     2 1       10 34  Sun     3 2       21 01  Sun     3

[python] Read specific columns from a csv file with csv module?

Examples related to python

Examples related to csv