[bash] Use space as a delimiter with cut command

I want to use space as a delimiter with the cut command.

What syntax can I use for this?

This question is related to bash unix cut

The answer is


I have an answer (I admit somewhat confusing answer) that involvessed, regular expressions and capture groups:

  • \S* - first word
  • \s* - delimiter
  • (\S*) - second word - captured
  • .* - rest of the line

As a sed expression, the capture group needs to be escaped, i.e. \( and \).

The \1 returns a copy of the captured group, i.e. the second word.

$ echo "alpha beta gamma delta" | sed 's/\S*\s*\(\S*\).*/\1/'
beta

When you look at this answer, its somewhat confusing, and, you may think, why bother? Well, I'm hoping that some, may go "Aha!" and will use this pattern to solve some complex text extraction problems with a single sed expression.


Usually if you use space as delimiter, you want to treat multiple spaces as one, because you parse the output of a command aligning some columns with spaces. (and the google search for that lead me here)

In this case a single cut command is not sufficient, and you need to use:

tr -s ' ' | cut -d ' ' -f 2

Or

awk '{print $2}'

You can't do it easily with cut if the data has for example multiple spaces. I have found it useful to normalize input for easier processing. One trick is to use sed for normalization as below.

echo -e "foor\t \t bar" | sed 's:\s\+:\t:g' | cut -f2  #bar

To complement the existing, helpful answers; tip of the hat to QZ Support for encouraging me to post a separate answer:

Two distinct mechanisms come into play here:

  • (a) whether cut itself requires the delimiter (space, in this case) passed to the -d option to be a separate argument or whether it's acceptable to append it directly to -d.

  • (b) how the shell generally parses arguments before passing them to the command being invoked.

(a) is answered by a quote from the POSIX guidelines for utilities (emphasis mine)

If the SYNOPSIS of a standard utility shows an option with a mandatory option-argument [...] a conforming application shall use separate arguments for that option and its option-argument. However, a conforming implementation shall also permit applications to specify the option and option-argument in the same argument string without intervening characters.

In other words: In this case, because -d's option-argument is mandatory, you can choose whether to specify the delimiter as:

  • (s) EITHER: a separate argument
  • (d) OR: as a value directly attached to -d.

Once you've chosen (s) or (d), it is the shell's string-literal parsing - (b) - that matters:

  • With approach (s), all of the following forms are EQUIVALENT:

    • -d ' '
    • -d " "
    • -d \<space> # <space> used to represent an actual space for technical reasons
  • With approach (d), all of the following forms are EQUIVALENT:

    • -d' '
    • -d" "
    • "-d "
    • '-d '
    • d\<space>

The equivalence is explained by the shell's string-literal processing:

All solutions above result in the exact same string (in each group) by the time cut sees them:

  • (s): cut sees -d, as its own argument, followed by a separate argument that contains a space char - without quotes or \ prefix!.

  • (d): cut sees -d plus a space char - without quotes or \ prefix! - as part of the same argument.

The reason the forms in the respective groups are ultimately identical is twofold, based on how the shell parses string literals:

  • The shell allows literal to be specified as is through a mechanism called quoting, which can take several forms:
    • single-quoted strings: the contents inside '...' is taken literally and forms a single argument
    • double-quoted strings: the contents inside "..." also forms a single argument, but is subject to interpolation (expands variable references such as $var, command substitutions ($(...) or `...`), or arithmetic expansions ($(( ... ))).
    • \-quoting of individual characters: a \ preceding a single character causes that character to be interpreted as a literal.
  • Quoting is complemented by quote removal, which means that once the shell has parsed a command line, it removes the quote characters from the arguments (enclosing '...' or "..." or \ instances) - thus, the command being invoked never sees the quote characters.

I just discovered that you can also use "-d ":

cut "-d "

Test

$ cat a
hello how are you
I am fine
$ cut "-d " -f2 a
how
am

scut, a cut-like utility (smarter but slower I made) that can use any perl regex as a breaking token. Breaking on whitespace is the default, but you can also break on multi-char regexes, alternative regexes, etc.

scut -f='6 2 8 7' < input.file  > output.file

so the above command would break columns on whitespace and extract the (0-based) cols 6 2 8 7 in that order.


You can also say:

cut -d\  -f 2

Note that there are two spaces after the backslash.


Examples related to bash

Comparing a variable with a string python not working when redirecting from bash script Zipping a file in bash fails How do I prevent Conda from activating the base environment by default? Get first line of a shell command's output Fixing a systemd service 203/EXEC failure (no such file or directory) /bin/sh: apt-get: not found VSCode Change Default Terminal Run bash command on jenkins pipeline How to check if the docker engine and a docker container are running? How to switch Python versions in Terminal?

Examples related to unix

Docker CE on RHEL - Requires: container-selinux >= 2.9 What does `set -x` do? How to find files modified in last x minutes (find -mmin does not work as expected) sudo: npm: command not found How to sort a file in-place How to read a .properties file which contains keys that have a period character using Shell script gpg decryption fails with no secret key error Loop through a comma-separated shell variable Best way to find os name and version in Unix/Linux platform Resource u'tokenizers/punkt/english.pickle' not found

Examples related to cut

Loop through a comma-separated shell variable cat, grep and cut - translated to python How to find the last field using 'cut' Using the grep and cut delimiter command (in bash shell scripting UNIX) - and kind of "reversing" it? Using cut command to remove multiple columns how to remove the first two columns in a file using shell (awk, sed, whatever) How can I remove the extension of a filename in a shell script? bash script use cut command at variable and store result at another variable How to specify more spaces for the delimiter using cut? Copy and paste content from one file to another file in vi