[linux] Sorting multiple keys with Unix sort

I have potentially large files that need to be sorted by 1-n keys. Some of these keys might be numeric and some of them might not be. This is a fixed-width columnar file so there are no delimiters.

Is there a good way to do this with Unix sort? With one key it is as simple as using '-n'. I have read the man page and searched Google briefly, but didn't find a good example. How would I go about accomplishing this?

Note: I have ruled out Perl because of the file size potential. It would be a last resort.

This question is related to linux unix sorting

The answer is


I believe in your case something like

sort -t@ -k1.1,1.4 -k1.5,1.7 ... <inputfile

will work better. @ is the field separator, make sure it is a character that appears nowhere. then your input is considered as consisting of one column.

Edit: apparently clintp already gave a similar answer, sorry. As he points out, the flags 'n' and 'r' can be added to every -k.... option.


The -k option is what you want.

-k 1.4,1.5n -k 1.14,1.15n

Would use character positions 4-5 in the first field (it's all one field for fixed width) and sort numerically as the first key.

The second key would be characters 14-15 in the first field also.

(edit)

Example (all I have is DOS/cygwin handy):

dir | \cygwin\bin\sort.exe -k 1.4,1.5n -k 1.40,1.60r

for the data:

12/10/2008  01:10 PM         1,564,990 outfile.txt

Sorts the directory listing by month number (pos 4-5) numerically, and then by filename (pos 40-60) in reverse. Since there are no tabs, it's all field 1 to sort.


I just want to add some tips, when you using sort , be careful about your locale that effects the order of the key comparison. I usually explicitly use LC_ALL=C to make locale what I want.


I believe in your case something like

sort -t@ -k1.1,1.4 -k1.5,1.7 ... <inputfile

will work better. @ is the field separator, make sure it is a character that appears nowhere. then your input is considered as consisting of one column.

Edit: apparently clintp already gave a similar answer, sorry. As he points out, the flags 'n' and 'r' can be added to every -k.... option.


The -k option is what you want.

-k 1.4,1.5n -k 1.14,1.15n

Would use character positions 4-5 in the first field (it's all one field for fixed width) and sort numerically as the first key.

The second key would be characters 14-15 in the first field also.

(edit)

Example (all I have is DOS/cygwin handy):

dir | \cygwin\bin\sort.exe -k 1.4,1.5n -k 1.40,1.60r

for the data:

12/10/2008  01:10 PM         1,564,990 outfile.txt

Sorts the directory listing by month number (pos 4-5) numerically, and then by filename (pos 40-60) in reverse. Since there are no tabs, it's all field 1 to sort.


I believe in your case something like

sort -t@ -k1.1,1.4 -k1.5,1.7 ... <inputfile

will work better. @ is the field separator, make sure it is a character that appears nowhere. then your input is considered as consisting of one column.

Edit: apparently clintp already gave a similar answer, sorry. As he points out, the flags 'n' and 'r' can be added to every -k.... option.


The -k option is what you want.

-k 1.4,1.5n -k 1.14,1.15n

Would use character positions 4-5 in the first field (it's all one field for fixed width) and sort numerically as the first key.

The second key would be characters 14-15 in the first field also.

(edit)

Example (all I have is DOS/cygwin handy):

dir | \cygwin\bin\sort.exe -k 1.4,1.5n -k 1.40,1.60r

for the data:

12/10/2008  01:10 PM         1,564,990 outfile.txt

Sorts the directory listing by month number (pos 4-5) numerically, and then by filename (pos 40-60) in reverse. Since there are no tabs, it's all field 1 to sort.


Here is one to sort various columns in a csv file by numeric and dictionary order, columns 5 and after as dictionary order

~/test>sort -t, -k1,1n -k2,2n -k3,3d -k4,4n -k5d  sort.csv
1,10,b,22,Ga
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C

~/test>cat sort.csv
2,3,a,9,C
2,2,b,20,F
2,2,c,19,Gb,hj
2,2,c,19,Gb,hi
2,2,c,19,Ga
2,2,b,22,Ga
1,10,b,22,Ga

Note the -k1,1n means numeric starting at column 1 and ending at column 1. If I had done below, it would have concatenated column 1 and 2 making 1,10 sorted as 110

~/test>sort -t, -k1,2n -k3,3 -k4,4n -k5d  sort.csv
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C
1,10,b,22,Ga

I just want to add some tips, when you using sort , be careful about your locale that effects the order of the key comparison. I usually explicitly use LC_ALL=C to make locale what I want.


Note that is may also be desired to stabilize the sort with the -s switch, so that equally ranked lines maintain their original relative order in the output too.


I believe in your case something like

sort -t@ -k1.1,1.4 -k1.5,1.7 ... <inputfile

will work better. @ is the field separator, make sure it is a character that appears nowhere. then your input is considered as consisting of one column.

Edit: apparently clintp already gave a similar answer, sorry. As he points out, the flags 'n' and 'r' can be added to every -k.... option.


Note that is may also be desired to stabilize the sort with the -s switch, so that equally ranked lines maintain their original relative order in the output too.


Take care though:

If you want to sort the file primarily by field 3, and secondarily by field 2 you want this:

sort -k 3,3 -k 2,2 < inputfile

Not this: sort -k 3 -k 2 < inputfile which sorts the file by the string from the beginning of field 3 to the end of line (which is potentially unique).

-k, --key=POS1[,POS2]     start a key at POS1 (origin 1), end it at POS2
                          (default end of line)

Here is one to sort various columns in a csv file by numeric and dictionary order, columns 5 and after as dictionary order

~/test>sort -t, -k1,1n -k2,2n -k3,3d -k4,4n -k5d  sort.csv
1,10,b,22,Ga
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C

~/test>cat sort.csv
2,3,a,9,C
2,2,b,20,F
2,2,c,19,Gb,hj
2,2,c,19,Gb,hi
2,2,c,19,Ga
2,2,b,22,Ga
1,10,b,22,Ga

Note the -k1,1n means numeric starting at column 1 and ending at column 1. If I had done below, it would have concatenated column 1 and 2 making 1,10 sorted as 110

~/test>sort -t, -k1,2n -k3,3 -k4,4n -k5d  sort.csv
2,2,b,20,F
2,2,b,22,Ga
2,2,c,19,Ga
2,2,c,19,Gb,hi
2,2,c,19,Gb,hj
2,3,a,9,C
1,10,b,22,Ga

Take care though:

If you want to sort the file primarily by field 3, and secondarily by field 2 you want this:

sort -k 3,3 -k 2,2 < inputfile

Not this: sort -k 3 -k 2 < inputfile which sorts the file by the string from the beginning of field 3 to the end of line (which is potentially unique).

-k, --key=POS1[,POS2]     start a key at POS1 (origin 1), end it at POS2
                          (default end of line)

Examples related to linux

grep's at sign caught as whitespace How to prevent Google Colab from disconnecting? "E: Unable to locate package python-pip" on Ubuntu 18.04 How to upgrade Python version to 3.7? Install Qt on Ubuntu Get first line of a shell command's output Cannot connect to the Docker daemon at unix:/var/run/docker.sock. Is the docker daemon running? Run bash command on jenkins pipeline How to uninstall an older PHP version from centOS7 How to update-alternatives to Python 3 without breaking apt?

Examples related to unix

Docker CE on RHEL - Requires: container-selinux >= 2.9 What does `set -x` do? How to find files modified in last x minutes (find -mmin does not work as expected) sudo: npm: command not found How to sort a file in-place How to read a .properties file which contains keys that have a period character using Shell script gpg decryption fails with no secret key error Loop through a comma-separated shell variable Best way to find os name and version in Unix/Linux platform Resource u'tokenizers/punkt/english.pickle' not found

Examples related to sorting

Sort Array of object by object field in Angular 6 Sorting a list with stream.sorted() in Java How to sort dates from Oldest to Newest in Excel? how to sort pandas dataframe from one column Reverse a comparator in Java 8 Find the unique values in a column and then sort them pandas groupby sort within groups pandas groupby sort descending order Efficiently sorting a numpy array in descending order? Swift: Sort array of objects alphabetically