[awk] Printing everything except the first field with awk

I have a file that looks like this:

AE  United Arab Emirates
AG  Antigua & Barbuda
AN  Netherlands Antilles
AS  American Samoa
BA  Bosnia and Herzegovina
BF  Burkina Faso
BN  Brunei Darussalam

And I 'd like to invert the order, printing first everything except $1 and then $1:

United Arab Emirates AE

How can I do the "everything except field 1" trick?

This question is related to awk sed

The answer is


Use the cut command with -f 2- (POSIX) or --complement (not POSIX):

$ echo a b c | cut -f 2- -d ' '
b c
$ echo a b c | cut -f 1 -d ' '
a
$ echo a b c | cut -f 1,2 -d ' '
a b
$ echo a b c | cut -f 1 -d ' ' --complement
b c

There's a sed option too...

 sed 's/\([^ ]*\)  \(.*\)/\2 \1/' inputfile.txt

Explained...

Swap
\([^ ]*\) = Match anything until we reach a space, store in $1
\(.*\)    = Match everything else, store in $2
With
\2        = Retrieve $2
\1        = Retrieve $1

More thoroughly explained...

s    = Swap
/    = Beginning of source pattern
\(   = start storing this value
[^ ] = text not matching the space character
*    = 0 or more of the previous pattern
\)   = stop storing this value
\(   = start storing this value
.    = any character
*    = 0 or more of the previous pattern
\)   = stop storing this value
/    = End of source pattern, beginning of replacement
\2   = Retrieve the 2nd stored value
\1   = Retrieve the 1st stored value
/    = end of replacement

Let's move all the records to the next one and set the last one as the first:

$ awk '{a=$1; for (i=2; i<=NF; i++) $(i-1)=$i; $NF=a}1' file
United Arab Emirates AE
Antigua & Barbuda AG
Netherlands Antilles AN
American Samoa AS
Bosnia and Herzegovina BA
Burkina Faso BF
Brunei Darussalam BN

Explanation

  • a=$1 save the first value into a temporary variable.
  • for (i=2; i<=NF; i++) $(i-1)=$i save the Nth field value into the (N-1)th field.
  • $NF=a save the first value ($1) into the last field.
  • {}1 true condition to make awk perform the default action: {print $0}.

This way, if you happen to have another field separator, the result is also good:

$ cat c
AE-United-Arab-Emirates
AG-Antigua-&-Barbuda
AN-Netherlands-Antilles
AS-American-Samoa
BA-Bosnia-and-Herzegovina
BF-Burkina-Faso
BN-Brunei-Darussalam

$ awk 'BEGIN{OFS=FS="-"}{a=$1; for (i=2; i<=NF; i++) $(i-1)=$i; $NF=a}1' c
United-Arab-Emirates-AE
Antigua-&-Barbuda-AG
Netherlands-Antilles-AN
American-Samoa-AS
Bosnia-and-Herzegovina-BA
Burkina-Faso-BF
Brunei-Darussalam-BN

Another and easy way using cat command

cat filename | awk '{print $2,$3,$4,$5,$6,$1}' > newfilename

awk '{sub($1 FS,"")}7' YourFile

Remove the first field and separator, and print the result (7 is a non zero value so printing $0).


If you're open to another Perl solution:

perl -ple 's/^(\S+)\s+(.*)/$2 $1/' file

Maybe the most concise way:

$ awk '{$(NF+1)=$1;$1=""}sub(FS,"")' infile
United Arab Emirates AE
Antigua & Barbuda AG
Netherlands Antilles AN
American Samoa AS
Bosnia and Herzegovina BA
Burkina Faso BF
Brunei Darussalam BN

Explanation:

$(NF+1)=$1: Generator of a "new" last field.

$1="": Set the original first field to null

sub(FS,""): After the first two actions {$(NF+1)=$1;$1=""} get rid of the first field separator by using sub. The final print is implicit.


$1="" leaves a space as Ben Jackson mentioned, so use a for loop:

awk '{for (i=2; i<=NF; i++) print $i}' filename

So if your string was "one two three", the output will be:

two
three

If you want the result in one row, you could do as follows:

awk '{for (i=2; i<NF; i++) printf $i " "; print $NF}' filename

This will give you: "two three"


A first stab at it seems to work for your particular case.

awk '{ f = $1; i = $NF; while (i <= 0); gsub(/^[A-Z][A-Z][ ][ ]/,""); print $i, f; }'

Option 1

There is a solution that works with some versions of awk:

awk '{ $(NF+1)=$1;$1="";$0=$0;} NF=NF ' infile.txt

Explanation:

       $(NF+1)=$1                          # add a new field equal to field 1.
                  $1=""                    # erase the contents of field 1.
                        $0=$0;} NF=NF      # force a re-calc of fields.
                                           # and use NF to promote a print.

Result:

United Arab Emirates AE
Antigua & Barbuda AG
Netherlands Antilles AN
American Samoa AS
Bosnia and Herzegovina BA
Burkina Faso BF
Brunei Darussalam BN

However that might fail with older versions of awk.


Option 2

awk '{ $(NF+1)=$1;$1="";sub(OFS,"");}1' infile.txt

That is:

awk '{                                      # call awk.
       $(NF+1)=$1;                          # Add one trailing field.
                  $1="";                    # Erase first field.
                        sub(OFS,"");        # remove leading OFS.
                                    }1'     # print the line.

Note that what needs to be erased is the OFS, not the FS. The line gets re-calculated when the field $1 is asigned. That changes all runs of FS to one OFS.


But even that option still fails with several delimiters, as is clearly shown by changing the OFS:

awk -v OFS=';' '{ $(NF+1)=$1;$1="";sub(OFS,"");}1' infile.txt

That line will output:

United;Arab;Emirates;AE
Antigua;&;Barbuda;AG
Netherlands;Antilles;AN
American;Samoa;AS
Bosnia;and;Herzegovina;BA
Burkina;Faso;BF
Brunei;Darussalam;BN

That reveals that runs of FS are being changed to one OFS.
The only way to avoid that is to avoid the field re-calculation.
One function that can avoid re-calc is sub.
The first field could be captured, then removed from $0 with sub, and then both re-printed.

Option 3

awk '{ a=$1;sub("[^"FS"]+["FS"]+",""); print $0, a;}' infile.txt
       a=$1                                   # capture first field.
       sub( "                                 # replace: 
             [^"FS"]+                         # A run of non-FS
                     ["FS"]+                  # followed by a run of FS.
                            " , ""            # for nothing.
                                  )           # Default to $0 (the whole line.
       print $0, a                   # Print in reverse order, with OFS.


United Arab Emirates AE
Antigua & Barbuda AG
Netherlands Antilles AN
American Samoa AS
Bosnia and Herzegovina BA
Burkina Faso BF
Brunei Darussalam BN

Even if we change the FS, the OFS and/or add more delimiters, it works.
If the input file is changed to:

AE..United....Arab....Emirates
AG..Antigua....&...Barbuda
AN..Netherlands...Antilles
AS..American...Samoa
BA..Bosnia...and...Herzegovina
BF..Burkina...Faso
BN..Brunei...Darussalam

And the command changes to:

awk -vFS='.' -vOFS=';' '{a=$1;sub("[^"FS"]+["FS"]+",""); print $0,a;}' infile.txt

The output will be (still preserving delimiters):

United....Arab....Emirates;AE
Antigua....&...Barbuda;AG
Netherlands...Antilles;AN
American...Samoa;AS
Bosnia...and...Herzegovina;BA
Burkina...Faso;BF
Brunei...Darussalam;BN

The command could be expanded to several fields, but only with modern awks and with --re-interval option active. This command on the original file:

awk -vn=2 '{a=$1;b=$2;sub("([^"FS"]+["FS"]+){"n"}","");print $0,a,b;}' infile.txt

Will output this:

Arab Emirates AE United
& Barbuda AG Antigua
Antilles AN Netherlands
Samoa AS American
and Herzegovina BA Bosnia
Faso BF Burkina
Darussalam BN Brunei

If you're open to a Perl solution...

perl -lane 'print join " ",@F[1..$#F,0]' file

is a simple solution with an input/output separator of one space, which produces:

United Arab Emirates AE
Antigua & Barbuda AG
Netherlands Antilles AN
American Samoa AS
Bosnia and Herzegovina BA
Burkina Faso BF
Brunei Darussalam BN

This next one is slightly more complex

perl -F`  ` -lane 'print join "  ",@F[1..$#F,0]' file

and assumes that the input/output separator is two spaces:

United Arab Emirates  AE
Antigua & Barbuda  AG
Netherlands Antilles  AN
American Samoa  AS
Bosnia and Herzegovina  BA
Burkina Faso  BF
Brunei Darussalam  BN

These command-line options are used:

  • -n loop around every line of the input file, do not automatically print every line

  • -l removes newlines before processing, and adds them back in afterwards

  • -a autosplit mode – split input lines into the @F array. Defaults to splitting on whitespace

  • -F autosplit modifier, in this example splits on ' ' (two spaces)

  • -e execute the following perl code

@F is the array of words in each line, indexed starting with 0
$#F is the number of words in @F
@F[1..$#F] is an array slice of element 1 through the last element
@F[1..$#F,0] is an array slice of element 1 through the last element plus element 0


Yet another way...

...this rejoins the fields 2 thru NF with the FS and outputs one line per line of input

awk '{for (i=2;i<=NF;i++){printf $i; if (i < NF) {printf FS};}printf RS}'

I use this with git to see what files have been modified in my working dir:

git diff| \
    grep '\-\-git'| \
    awk '{print$NF}'| \
    awk -F"/" '{for (i=2;i<=NF;i++){printf $i; if (i < NF) {printf FS};}printf RS}'

awk '{ tmp = $1; sub(/^[^ ]+ +/, ""); print $0, tmp }'


The field separator in gawk (at least) can be a string as well as a character (it can also be a regex). If your data is consistent, then this will work:

awk -F "  " '{print $2,$1}' inputfile

That's two spaces between the double quotes.


awk '{ saved = $1; $1 = ""; print substr($0, 2), saved }'

Setting the first field to "" leaves a single copy of OFS at the start of $0. Assuming that OFS is only a single character (by default, it's a single space), we can remove it with substr($0, 2). Then we append the saved copy of $1.