Extract specific columns from delimited file using Awk

Question

Sorry if this is too basic  I have a csv file where the columns have a header row   v1  v2  etc    I understand that to extract columns 1 and 2  I have to do  awk -F       print  1      2   infile csv  gt  outfile csv  But what if I have to extract  say  columns 1 to 10  20 to 25  and 30  33  As an addendum  is there any way to extract directly with the header names rather than with column numbers

User · Answer

You can use a for-loop to address a field with  i   ls -l   awk   for i 3   i lt 8   i     printf   s t    i   print

User · Answer

As mentioned by  Tom  the cut and awk approaches actually don t work for CSVs with quoted strings  An alternative is a module for python that provides the command line tool csvfilter  It works like cut  but properly handles CSV column quoting   csvfilter -f 1 3 5 in csv  gt  out csv   If you have python  and you should   you can install it simply like this   pip install csvfilter   Please take note that the column indexing in csvfilter starts with 0  unlike awk  which starts with  1   More info at https   github com codeinthehole csvfilter

User · Answer

Tabulator is a set of unix command line tools to work with csv files that have header lines  Here is an example to extract columns by name from a file test csv   name sex house nr height shoe size arthur m 42 181 11 5 berta f 101 163 8 5 chris m 1333 175 10 don m 77 185 12 5 elisa f 204 166 7   Then tblmap -k name height test csv produces  name height arthur 181 berta 163 chris 175 don 185 elisa 166

User · Answer

Other languages have short cuts for ranges of field numbers  but not awk  you ll  have to write your code as your fear  -   awk -F   BEGIN  OFS        print  1   2   3   4        30   33   infile csv  gt  outfile csv   There is no direct function in awk to use field names as column specifiers   I hope this helps

User · Answer

I don t know if it s possible to do ranges in awk  You could do a for loop  but you would have to add handling to filter out the columns you don t want  It s probably easier to do this   awk -F    OFS     print  1  2  3  4  5  6  7  8  9  10  20  21  22  23  24  25  30  33   infile csv  gt  outfile csv   something else to consider - and this faster and more concise   cut -d     -f1-10 20-25 30-33 infile csv  gt  outfile csv   As to the second part of your question  I would probably write a script in perl that knows how to handle header rows  parsing the columns names from stdin or a file and then doing the filtering  It s probably a tool I would want to have for other things  I am not sure about doing in a one liner  although I am sure it can be done

User · Answer

Others have answered your earlier question  For this      As an addendum  is there any way to extract directly with the header names rather than with column numbers    I haven t tried it  but you could store each header s index in a hash and then use that hash to get its index later on   for i 0 i lt  NF i         hash  i    i      Then later on  use it   j   hash  header1    print  j

User · Answer

If Perl is an option   perl -F  -lane  print join      F 0 1 2 3 4 5 6 7 8 9 19 20 21 22 23 24 29 32    -a autosplits line into  F fields array   Indices start at 0  not 1 as in awk  -F  field separator is    If your CSV file contains commas within quotes  fully fledged CSV parsers such as Perl s Text  CSV XS are purpose-built to handle that kind of weirdness     perl -MText  CSV XS -lne  BEGIN  csv Text  CSV XS- gt new    if  csv- gt parse       f  csv- gt fields   print  join      f 0 1 2 3 4 5 6 7 8 9 19 20 21 22 23 24 29 32      I provided more explanation within my answer here   parse csv file using gawk

User · Answer

Not using awk but the simplest way I was able to get this done was to just use csvtool  I had other use cases as well to use csvtool and it can handle the quotes or delimiters appropriately if they appear within the column data itself   csvtool format    2  n  input csv csvtool format    2    3    4  n  input csv   Replacing 2 with the column number will effectively extract the column data you are looking for

[unix] Extract specific columns from delimited file using Awk

Examples related to unix

Examples related to csv

Examples related to awk