[linux] Using awk to print all columns from the nth to the last

This line worked until I had whitespace in the second field.

svn status | grep '\!' | gawk '{print $2;}' > removedProjs

is there a way to have awk print everything in $2 or greater? ($3, $4.. until we don't have anymore columns?)

I suppose I should add that I'm doing this in a Windows environment with Cygwin.

This question is related to linux awk

The answer is


Printing out columns starting from #2 (the output will have no trailing space in the beginning):

ls -l | awk '{sub(/[^ ]+ /, ""); print $0}'

This would work if you are using Bash and you could use as many 'x ' as elements you wish to discard and it ignores multiple spaces if they are not escaped.

while read x b; do echo "$b"; done < filename

If you want formatted text, chain your commands with echo and use $0 to print the last field.

Example:

for i in {8..11}; do
   s1="$i"
   s2="str$i"
   s3="str with spaces $i"
   echo -n "$s1 $s2" | awk '{printf "|%3d|%6s",$1,$2}'
   echo -en "$s3" | awk '{printf "|%-19s|\n", $0}'
done

Prints:

|  8|  str8|str with spaces 8  |
|  9|  str9|str with spaces 9  |
| 10| str10|str with spaces 10 |
| 11| str11|str with spaces 11 |

You could use a for-loop to loop through printing fields $2 through $NF (built-in variable that represents the number of fields on the line).

Edit: Since "print" appends a newline, you'll want to buffer the results:

awk '{out=""; for(i=2;i<=NF;i++){out=out" "$i}; print out}'

Alternatively, use printf:

awk '{for(i=2;i<=NF;i++){printf "%s ", $i}; printf "\n"}'

I personally tried all the answers mentioned above, but most of them were a bit complex or just not right. The easiest way to do it from my point of view is:

awk -F" " '{ for (i=4; i<=NF; i++) print $i }'
  1. Where -F" " defines the delimiter for awk to use. In my case is the whitespace, which is also the default delimiter for awk. This means that -F" " can be ignored.

  2. Where NF defines the total number of fields/columns. Therefore the loop will begin from the 4th field up to the last field/column.

  3. Where $N retrieves the value of the Nth field. Therefore print $i will print the current field/column based based on the loop count.


Awk examples looks complex here, here is simple Bash shell syntax:

command | while read -a cols; do echo ${cols[@]:1}; done

Where 1 is your nth column counting from 0.


Example

Given this content of file (in.txt):

c1
c1 c2
c1 c2 c3
c1 c2 c3 c4
c1 c2 c3 c4 c5

here is the output:

$ while read -a cols; do echo ${cols[@]:1}; done < in.txt 

c2
c2 c3
c2 c3 c4
c2 c3 c4 c5

Perl solution:

perl -lane 'splice @F,0,1; print join " ",@F' file

These command-line options are used:

  • -n loop around every line of the input file, do not automatically print every line

  • -l removes newlines before processing, and adds them back in afterwards

  • -a autosplit mode – split input lines into the @F array. Defaults to splitting on whitespace

  • -e execute the perl code

splice @F,0,1 cleanly removes column 0 from the @F array

join " ",@F joins the elements of the @F array, using a space in-between each element


Python solution:

python -c "import sys;[sys.stdout.write(' '.join(line.split()[1:]) + '\n') for line in sys.stdin]" < file


Perl:

@m=`ls -ltr dir | grep ^d | awk '{print \$6,\$7,\$8,\$9}'`;
foreach $i (@m)
{
        print "$i\n";

}

I want to extend the proposed answers to the situation where fields are delimited by possibly several whitespaces –the reason why the OP is not using cut I suppose.

I know the OP asked about awk, but a sed approach would work here (example with printing columns from the 5th to the last):

  • pure sed approach

    sed -r 's/^\s*(\S+\s+){4}//' somefile
    

    Explanation:

    • s/// is used the standard way to perform substitution
    • ^\s* matches any consecutive whitespace at the beginning of the line
    • \S+\s+ means a column of data (non-whitespace chars followed by whitespace chars)
    • (){4} means the pattern is repeated 4 times.
  • sed and cut

    sed -r 's/^\s+//; s/\s+/\t/g' somefile | cut -f5-
    

    by just replacing consecutive whitespaces by a single tab;

  • tr and cut: tr can also be used to squeeze consecutive characters with the -s option.

    tr -s [:blank:] <somefile | cut -d' ' -f5-
    

This was irritating me so much, I sat down and wrote a cut-like field specification parser, tested with GNU Awk 3.1.7.

First, create a new Awk library script called pfcut, with e.g.

sudo nano /usr/share/awk/pfcut

Then, paste in the script below, and save. After that, this is how the usage looks like:

$ echo "t1 t2 t3 t4 t5 t6 t7" | awk -f pfcut --source '/^/ { pfcut("-4"); }'
t1 t2 t3 t4

$ echo "t1 t2 t3 t4 t5 t6 t7" | awk -f pfcut --source '/^/ { pfcut("2-"); }'
t2 t3 t4 t5 t6 t7

$ echo "t1 t2 t3 t4 t5 t6 t7" | awk -f pfcut --source '/^/ { pfcut("-2,4,6-"); }'
t1 t2 t4 t6 t7

To avoid typing all that, I guess the best one can do (see otherwise Automatically load a user function at startup with awk? - Unix & Linux Stack Exchange) is add an alias to ~/.bashrc; e.g. with:

$ echo "alias awk-pfcut='awk -f pfcut --source'" >> ~/.bashrc
$ source ~/.bashrc     # refresh bash aliases

... then you can just call:

$ echo "t1 t2 t3 t4 t5 t6 t7" | awk-pfcut '/^/ { pfcut("-2,4,6-"); }'
t1 t2 t4 t6 t7

Here is the source of the pfcut script:

# pfcut - print fields like cut
#
# sdaau, GNU GPL
# Nov, 2013

function spfcut(formatstring)
{
  # parse format string
  numsplitscomma = split(formatstring, fsa, ",");
  numspecparts = 0;
  split("", parts); # clear/initialize array (for e.g. `tail` piping into `awk`)
  for(i=1;i<=numsplitscomma;i++) {
    commapart=fsa[i];
    numsplitsminus = split(fsa[i], cpa, "-");
    # assume here a range is always just two parts: "a-b"
    # also assume user has already sorted the ranges
    #print numsplitsminus, cpa[1], cpa[2]; # debug
    if(numsplitsminus==2) {
     if ((cpa[1]) == "") cpa[1] = 1;
     if ((cpa[2]) == "") cpa[2] = NF;
     for(j=cpa[1];j<=cpa[2];j++) {
       parts[numspecparts++] = j;
     }
    } else parts[numspecparts++] = commapart;
  }
  n=asort(parts); outs="";
  for(i=1;i<=n;i++) {
    outs = outs sprintf("%s%s", $parts[i], (i==n)?"":OFS); 
    #print(i, parts[i]); # debug
  }
  return outs;
}

function pfcut(formatstring) {
  print spfcut(formatstring);
}

If you don't want to reformat the part of the line that you don't chop off, the best solution I can think of is written in my answer in:

How to print all the columns after a particular number using awk?

It chops what is before the given field number N, and prints all the rest of the line, including field number N and maintaining the original spacing (it does not reformat). It doesn't mater if the string of the field appears also somewhere else in the line.

Define a function:

fromField () { 
awk -v m="\x01" -v N="$1" '{$N=m$N; print substr($0,index($0,m)+1)}'
}

And use it like this:

$ echo "  bat   bi       iru   lau bost   " | fromField 3
iru   lau bost   
$ echo "  bat   bi       iru   lau bost   " | fromField 2
bi       iru   lau bost 

Output maintains everything, including trailing spaces

In you particular case:

svn status | grep '\!' | fromField 2 > removedProjs

If your file/stream does not contain new-line characters in the middle of the lines (you could be using a different Record Separator), you can use:

awk -v m="\x0a" -v N="3" '{$N=m$N ;print substr($0, index($0,m)+1)}'

The first case will fail only in files/streams that contain the rare hexadecimal char number 1


Most solutions with awk leave an space. The options here avoid that problem.

Option 1

A simple cut solution (works only with single delimiters):

command | cut -d' ' -f3-

Option 2

Forcing an awk re-calc sometimes remove the added leading space (OFS) left by removing the first fields (works with some versions of awk):

command | awk '{ $1=$2="";$0=$0;} NF=NF'

Option 3

Printing each field formatted with printf will give more control:

$ in='    1    2  3     4   5   6 7     8  '
$ echo "$in"|awk -v n=2 '{ for(i=n+1;i<=NF;i++) printf("%s%s",$i,i==NF?RS:OFS);}'
3 4 5 6 7 8

However, all previous answers change all repeated FS between fields to OFS. Let's build a couple of option that do not do that.

Option 4 (recommended)

A loop with sub to remove fields and delimiters at the front.
And using the value of FS instead of space (which could be changed).
Is more portable, and doesn't trigger a change of FS to OFS: NOTE: The ^[FS]* is to accept an input with leading spaces.

$ in='    1    2  3     4   5   6 7     8  '
$ echo "$in" | awk '{ n=2; a="^["FS"]*[^"FS"]+["FS"]+";
  for(i=1;i<=n;i++) sub( a , "" , $0 ) } 1 '
3     4   5   6 7     8

Option 5

It is quite possible to build a solution that does not add extra (leading or trailing) whitespace, and preserve existing whitespace(s) using the function gensub from GNU awk, as this:

$ echo '    1    2  3     4   5   6 7     8  ' |
  awk -v n=2 'BEGIN{ a="^["FS"]*"; b="([^"FS"]+["FS"]+)"; c="{"n"}"; }
          { print(gensub(a""b""c,"",1)); }'
3     4   5   6 7     8 

It also may be used to swap a group of fields given a count n:

$ echo '    1    2  3     4   5   6 7     8  ' |
  awk -v n=2 'BEGIN{ a="^["FS"]*"; b="([^"FS"]+["FS"]+)"; c="{"n"}"; }
          {
            d=gensub(a""b""c,"",1);
            e=gensub("^(.*)"d,"\\1",1,$0);
            print("|"d"|","!"e"!");
          }'
|3     4   5   6 7     8  | !    1    2  !

Of course, in such case, the OFS is used to separate both parts of the line, and the trailing white space of the fields is still printed.

NOTE: [FS]* is used to allow leading spaces in the input line.


Because of a wrong most upvoted anwser with 340 votes, I just lost 5 minutes of my life! Did anybody try this answer out before upvoting this? Apparantly not. Completely useless.

I have a log where after $5 with an IP address can be more text or no text. I need everything from the IP address to the end of the line should there be anything after $5. In my case, this is actualy withn an awk program, not an awk oneliner so awk must solve the problem. When I try to remove the first 4 fields using the most upvoted but completely wrong answer:

echo "  7 27.10.16. Thu 11:57:18 37.244.182.218" | awk '{$1=$2=$3=$4=""; printf "[%s]\n", $0}'

it spits out wrong and useless response (I added [..] to demonstrate):

[    37.244.182.218 one two three]

There are even some sugestions to combine substr with this wrong answer. Like that complication is an improvement.

Instead, if columns are fixed width until the cut point and awk is needed, the correct answer is:

echo "  7 27.10.16. Thu 11:57:18 37.244.182.218" | awk '{printf "[%s]\n", substr($0,28)}'

which produces the desired output:

[37.244.182.218 one two three]

awk '{ for(i=3; i<=NF; ++i) printf $i""FS; print "" }'

lauhub proposed this correct, simple and fast solution here


ls -la | awk '{o=$1" "$3; for (i=5; i<=NF; i++) o=o" "$i; print o }'

from this answer is not bad but the natural spacing is gone.
Please then compare it to this one:

ls -la | cut -d\  -f4-

Then you'd see the difference.

Even ls -la | awk '{$1=$2=""; print}' which is based on the answer voted best thus far is not preserve the formatting.

Thus I would use the following, and it also allows explicit selective columns in the beginning:

ls -la | cut -d\  -f1,4-

Note that every space counts for columns too, so for instance in the below, columns 1 and 3 are empty, 2 is INFO and 4 is:

$ echo " INFO  2014-10-11 10:16:19  main " | cut -d\  -f1,3

$ echo " INFO  2014-10-11 10:16:19  main " | cut -d\  -f2,4
INFO 2014-10-11
$

echo "1 2 3 4 5 6" | awk '{ $NF = ""; print $0}'

this one uses awk to print all except the last field


This is what I preferred from all the recommendations:

Printing from the 6th to last column.

ls -lthr | awk '{out=$6; for(i=7;i<=NF;i++){out=out" "$i}; print out}'

or

ls -lthr | awk '{ORS=" "; for(i=6;i<=NF;i++) print $i;print "\n"}'

This awk function returns substring of $0 that includes fields from begin to end:

function fields(begin, end,    b, e, p, i) {
    b = 0; e = 0; p = 0;
    for (i = 1; i <= NF; ++i) {
        if (begin == i) { b = p; }
        p += length($i);
        e = p;
        if (end == i) { break; }
        p += length(FS);
    }
    return substr($0, b + 1, e - b);
}

To get everything starting from field 3:

tail = fields(3);

To get section of $0 that covers fields 3 to 5:

middle = fields(3, 5);

b, e, p, i nonsense in function parameter list is just an awk way of declaring local variables.


awk '{out=$2; for(i=3;i<=NF;i++){out=out" "$i}; print out}'

My answer is based on the one of VeeArr, but I noticed it started with a white space before it would print the second column (and the rest). As I only have 1 reputation point, I can't comment on it, so here it goes as a new answer:

start with "out" as the second column and then add all the other columns (if they exist). This goes well as long as there is a second column.


There's a duplicate question with a simpler answer using cut:

 svn status |  grep '\!' | cut -d\  -f2-

-d specifies the delimeter (space), -f specifies the list of columns (all starting with the 2nd)


I wasn't happy with any of the awk solutions presented here because I wanted to extract the first few columns and then print the rest, so I turned to perl instead. The following code extracts the first two columns, and displays the rest as is:

echo -e "a  b  c  d\te\t\tf g" | \
  perl -ne 'my @f = split /\s+/, $_, 3; printf "first: %s second: %s rest: %s", @f;'

The advantage compared to the perl solution from Chris Koknat is that really only the first n elements are split off from the input string; the rest of the string isn't split at all and therefor stays completely intact. My example demonstrates this with a mix of spaces and tabs.

To change the amount of columns that should be extracted, replace the 3 in the example with n+1.


If you need specific columns printed with arbitrary delimeter:

awk '{print $3 "  " $4}'

col#3 col#4

awk '{print $3 "anything" $4}'

col#3anythingcol#4

So if you have whitespace in a column it will be two columns, but you can connect it with any delimiter or without it.


Would this work?

awk '{print substr($0,length($1)+1);}' < file

It leaves some whitespace in front though.