How to parse a CSV in a Bash script

Question

I am trying to parse a CSV containing potentially 100k  lines  Here is the criteria I have   The index of the identifier The identifier value  I would like to retrieve all lines in the CSV that have the given value in the given index  delimited by commas   Any ideas  taking in special consideration for performance

User · Answer

index 1 value 2 awk -F    -v i  index -v v  value    i   v  file

User · Answer

A sed or awk solution would probably be shorter  but here s one for Perl   perl -F    -ane  print if  F  lt INDEX gt   eq   lt VALUE gt      where  lt INDEX gt  is 0-based  0 for first column  1 for 2nd column  etc

User · Answer

First prototype using plain old grep and cut   grep    VALUE   inputfile csv   cut -d  -f   INDEX     If that s fast enough and gives the proper output  you re done

User · Answer

For situations where the data does not contain any special characters  the solution suggested by Nate Kohl and ghostdog74 is good   If the data contains commas or newlines inside the fields  awk may not properly count the field numbers and you ll get incorrect results   You can still use awk  with some help from a program I wrote called csvquote  available at https   github com dbro csvquote    csvquote inputfile csv   awk -F  -v index  INDEX -v value  VALUE   index    value  print     csvquote -u   This program finds special characters inside quoted fields  and temporarily replaces them with nonprinting characters which won t confuse awk  Then they get restored after awk is done

User · Answer

I was looking for an elegant solution that support quoting and wouldn t require installing anything fancy on my VMware vMA appliance  Turns out this simple python script does the trick   I named the script csv2tsv py  since it converts CSV into tab-separated values - TSV      usr bin env python  import sys  csv  with sys stdin as f      reader   csv reader f      for row in reader          for col in row              print col   t           print   Tab-separated values can be split easily with the cut command  no delimiter needs to be specified  tab is the default   Here s a sample usage output    gt  esxcli -h  VI HOST --formatter csv network vswitch standard list  csv2tsv py cut -f12 Uplinks vmnic4 vmnic0  vmnic5 vmnic1  vmnic6 vmnic2    In my scripts I m actually going to parse tsv output line by line and use read or cut to get the fields I need

User · Answer

Using awk   export INDEX 2 export VALUE bar  awk -F      INDEX        VALUE     print   inputfile csv   Edit  As per Dennis Williamson s excellent comment  this could be much more cleanly  and safely  written by defining awk variables using the -v switch   awk -F  -v index  INDEX -v value  VALUE   index    value  print   inputfile csv   Jeez   with variables  and everything  awk is almost a real programming language

User · Answer

See this youtube video  BASH scripting lesson 10 working with CSV files  CSV file   Bob Brown Manager 16581 Main Sally Seaforth Director 4678 HOME   Bash script      bin bash OLDIFS  IFS IFS     while read user job uid location  do      echo -e   user                              n      Role   t  job n      ID   t  uid n      SITE   t  location n   done  lt   1  IFS  OLDIFS   Output   Bob Brown                                Role     Manager     ID       16581     SITE     Main  Sally Seaforth                                Role     Director     ID       4678     SITE     HOME

User · Answer

In a CSV file  each field is separated by a comma  The problem is  a field itself might have an embedded comma   Name Phone  Woo  John  425-555-1212   You really need a library package that offer robust CSV support instead of relying on using comma as a field separator  I know that scripting languages such as Python has such support  However  I am comfortable with the Tcl scripting language so that is what I use  Here is a simple Tcl script which does what you are asking for      usr bin env tclsh  package require csv  package require Tclx    Parse the command line parameters lassign  argv fileName columnNumber expectedValue    Subtract 1 from columnNumber because Tcl s list index starts with a   zero instead of a one incr columnNumber -1  for file line  fileName       set columns  csv  split  line      set columnValue  lindex  columns  columnNumber      if   columnValue     expectedValue            puts  line              Save this script to a file called csv tcl and invoke it as     tclsh csv tcl filename indexNumber expectedValue   Explanation  The script reads the CSV file line by line and store the line in the variable  line  then it split each line into a list of columns  variable  columns   Next  it picks out the specified column and assigned it to the  columnValue variable  If there is a match  print out the original line

User · Answer

As an alternative to cut- or awk-based one-liners  you could use the specialized csvtool aka ocaml-csv     csvtool -t     col   index  -  lt  csvfile   grep   value    According to the docs  it handles escaping  quoting  etc

User · Answer

CSV isn t quite that simple  Depending on the limits of the data you have  you might have to worry about quoted values  which may contain commas and newlines  and escaping quotes   So if your data are restricted enough can get away with simple comma-splitting fine  shell script can do that easily  If  on the other hand  you need to parse CSV    properly     bash would not be my first choice  Instead I d look at a higher-level scripting language  for example Python with a csv reader

[bash] How to parse a CSV in a Bash script?

Examples related to bash

Examples related to csv

Examples related to shell