[bash] How to make the 'cut' command treat same sequental delimiters as one?

I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the cut command in the following manner:

cat text.txt | cut -d " " -f 4

Unfortunately, cut doesn't treat several spaces as one delimiter. I could have piped through awk

awk '{ printf $4; }'

or sed

sed -E "s/[[:space:]]+/ /g"

to collapse the spaces, but I'd like to know if there any way to deal with cut and several delimiters natively?

This question is related to bash unix delimiter cut

The answer is


With versions of cut I know of, no, this is not possible. cut is primarily useful for parsing files where the separator is not whitespace (for example /etc/passwd) and that have a fixed number of fields. Two separators in a row mean an empty field, and that goes for whitespace too.


This Perl one-liner shows how closely Perl is related to awk:

perl -lane 'print $F[3]' text.txt

However, the @F autosplit array starts at index $F[0] while awk fields start with $1


shortest/friendliest solution

After becoming frustrated with the too many limitations of cut, I wrote my own replacement, which I called cuts for "cut on steroids".

cuts provides what is likely the most minimalist solution to this and many other related cut/paste problems.

One example, out of many, addressing this particular question:

$ cat text.txt
0   1        2 3
0 1          2   3 4

$ cuts 2 text.txt
2
2

cuts supports:

  • auto-detection of most common field-delimiters in files (+ ability to override defaults)
  • multi-char, mixed-char, and regex matched delimiters
  • extracting columns from multiple files with mixed delimiters
  • offsets from end of line (using negative numbers) in addition to start of line
  • automatic side-by-side pasting of columns (no need to invoke paste separately)
  • support for field reordering
  • a config file where users can change their personal preferences
  • great emphasis on user friendliness & minimalist required typing

and much more. None of which is provided by standard cut.

See also: https://stackoverflow.com/a/24543231/1296044

Source and documentation (free software): http://arielf.github.io/cuts/


As you comment in your question, awk is really the way to go. To use cut is possible together with tr -s to squeeze spaces, as kev's answer shows.

Let me however go through all the possible combinations for future readers. Explanations are at the Test section.

tr | cut

tr -s ' ' < file | cut -d' ' -f4

awk

awk '{print $4}' file

bash

while read -r _ _ _ myfield _
do
   echo "forth field: $myfield"
done < file

sed

sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file

Tests

Given this file, let's test the commands:

$ cat a
this   is    line     1 more text
this      is line    2     more text
this    is line 3     more text
this is   line 4            more    text

tr | cut

$ cut -d' ' -f4 a
is
                        # it does not show what we want!


$ tr -s ' ' < a | cut -d' ' -f4
1
2                       # this makes it!
3
4
$

awk

$ awk '{print $4}' a
1
2
3
4

bash

This reads the fields sequentially. By using _ we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfield as the 4th field in the file, no matter the spaces in between them.

$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a
4th field: 1
4th field: 2
4th field: 3
4th field: 4

sed

This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1.

$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a
1
2
3
4

Examples related to bash

Comparing a variable with a string python not working when redirecting from bash script Zipping a file in bash fails How do I prevent Conda from activating the base environment by default? Get first line of a shell command's output Fixing a systemd service 203/EXEC failure (no such file or directory) /bin/sh: apt-get: not found VSCode Change Default Terminal Run bash command on jenkins pipeline How to check if the docker engine and a docker container are running? How to switch Python versions in Terminal?

Examples related to unix

Docker CE on RHEL - Requires: container-selinux >= 2.9 What does `set -x` do? How to find files modified in last x minutes (find -mmin does not work as expected) sudo: npm: command not found How to sort a file in-place How to read a .properties file which contains keys that have a period character using Shell script gpg decryption fails with no secret key error Loop through a comma-separated shell variable Best way to find os name and version in Unix/Linux platform Resource u'tokenizers/punkt/english.pickle' not found

Examples related to delimiter

How do I hide the PHP explode delimiter from submitted form results? How do I use a delimiter with Scanner.useDelimiter in Java? Split function in oracle to comma separated values with automatic sequence Split String by delimiter position using oracle SQL SQL split values to multiple rows How to use delimiter for csv in python Hive load CSV with commas in quoted fields How to escape indicator characters (i.e. : or - ) in YAML Delimiters in MySQL Split comma separated column data into additional columns

Examples related to cut

Loop through a comma-separated shell variable cat, grep and cut - translated to python How to find the last field using 'cut' Using the grep and cut delimiter command (in bash shell scripting UNIX) - and kind of "reversing" it? Using cut command to remove multiple columns how to remove the first two columns in a file using shell (awk, sed, whatever) How can I remove the extension of a filename in a shell script? bash script use cut command at variable and store result at another variable How to specify more spaces for the delimiter using cut? Copy and paste content from one file to another file in vi