[linux] Extracting columns from text file with different delimiters in Linux

I have very large genotype files that are basically impossible to open in R, so I am trying to extract the rows and columns of interest using linux command line. Rows are straightforward enough using head/tail, but I'm having difficulty figuring out how to handle the columns.

If I attempt to extract (say) the 100-105th tab or space delimited column using

 cut -c100-105 myfile >outfile

this obviously won't work if there are strings of multiple characters in each column. Is there some way to modify cut with appropriate arguments so that it extracts the entire string within a column, where columns are defined as space or tab (or any other character) delimited?

This question is related to linux

The answer is


You can use cut with a delimiter like this:

with space delim:

cut -d " " -f1-100,1000-1005 infile.csv > outfile.csv

with tab delim:

cut -d$'\t' -f1-100,1000-1005 infile.csv > outfile.csv

I gave you the version of cut in which you can extract a list of intervals...

Hope it helps!