I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the cut
command in the following manner:
cat text.txt | cut -d " " -f 4
Unfortunately, cut
doesn't treat several spaces as one delimiter. I could have piped through awk
awk '{ printf $4; }'
or sed
sed -E "s/[[:space:]]+/ /g"
to collapse the spaces, but I'd like to know if there any way to deal with cut
and several delimiters natively?
With versions of cut
I know of, no, this is not possible. cut
is primarily useful for parsing files where the separator is not whitespace (for example /etc/passwd
) and that have a fixed number of fields. Two separators in a row mean an empty field, and that goes for whitespace too.
This Perl one-liner shows how closely Perl is related to awk:
perl -lane 'print $F[3]' text.txt
However, the @F
autosplit array starts at index $F[0]
while awk fields start with $1
After becoming frustrated with the too many limitations of cut
, I wrote my own replacement, which I called cuts
for "cut on steroids".
cuts provides what is likely the most minimalist solution to this and many other related cut/paste problems.
One example, out of many, addressing this particular question:
$ cat text.txt
0 1 2 3
0 1 2 3 4
$ cuts 2 text.txt
2
2
cuts
supports:
paste
separately)and much more. None of which is provided by standard cut
.
See also: https://stackoverflow.com/a/24543231/1296044
Source and documentation (free software): http://arielf.github.io/cuts/
As you comment in your question, awk
is really the way to go. To use cut
is possible together with tr -s
to squeeze spaces, as kev's answer shows.
Let me however go through all the possible combinations for future readers. Explanations are at the Test section.
tr -s ' ' < file | cut -d' ' -f4
awk '{print $4}' file
while read -r _ _ _ myfield _
do
echo "forth field: $myfield"
done < file
sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file
Given this file, let's test the commands:
$ cat a
this is line 1 more text
this is line 2 more text
this is line 3 more text
this is line 4 more text
$ cut -d' ' -f4 a
is
# it does not show what we want!
$ tr -s ' ' < a | cut -d' ' -f4
1
2 # this makes it!
3
4
$
$ awk '{print $4}' a
1
2
3
4
This reads the fields sequentially. By using _
we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfield
as the 4th field in the file, no matter the spaces in between them.
$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a
4th field: 1
4th field: 2
4th field: 3
4th field: 4
This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}
. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1
.
$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a
1
2
3
4
Source: Stackoverflow.com