[bash] Use Awk to extract substring

Given a hostname in format of aaa0.bbb.ccc, I want to extract the first substring before ., that is, aaa0 in this case. I use following awk script to do so,

echo aaa0.bbb.ccc | awk '{if (match($0, /\./)) {print substr($0, 0, RSTART - 1)}}'

While the script running on one machine A produces aaa0, running on machine B produces only aaa, without 0 in the end. Both machine runs Ubuntu/Linaro, but A runs newer version of awk(gawk with version 3.1.8 while B with older awk (mawk with version 1.2)

I am asking in general, how to write a compatible awk script that performs the same functionality ...

This question is related to bash awk

The answer is


You just want to set the field separator as . using the -F option and print the first field:

$ echo aaa0.bbb.ccc | awk -F'.' '{print $1}'
aaa0

Same thing but using cut:

$ echo aaa0.bbb.ccc | cut -d'.' -f1
aaa0

Or with sed:

$ echo aaa0.bbb.ccc | sed 's/[.].*//'
aaa0

Even grep:

$ echo aaa0.bbb.ccc | grep -o '^[^.]*'
aaa0

Or just use cut:

echo aaa0.bbb.ccc | cut -d'.' -f1

I am asking in general, how to write a compatible awk script that performs the same functionality ...

To solve the problem in your quesiton is easy. (check others' answer).

If you want to write an awk script, which portable to any awk implementations and versions (gawk/nawk/mawk...) it is really hard, even if with --posix (gawk)

for example:

  • some awk works on string in terms of characters, some with bytes
  • some supports \x escape, some not
  • FS interpreter works differently
  • keywords/reserved words abbreviation restriction
  • some operator restriction e.g. **
  • even same awk impl. (gawk for example), the version 4.0 and 3.x have difference too.
  • the implementation of certain functions are also different. (your problem is one example, see below)

well all the points above are just spoken in general. Back to your problem, you problem is only related to fundamental feature of awk. awk '{print $x}' the line like that will work all awks.

There are two reasons why your awk line behaves differently on gawk and mawk:

  • your used substr() function wrongly. this is the main cause. you have substr($0, 0, RSTART - 1) the 0 should be 1, no matter which awk do you use. awk array, string idx etc are 1-based.

  • gawk and mawk implemented substr() differently.


You do not need any external command at all, just use Parameter Expansion in bash:

hostname=aaa0.bbb.ccc
echo ${hostname%%.*}

You don't need awk for this...

echo aaa0.bbb.ccc | cut -d. -f1
cut -d. -f1 <<< aaa0.bbb.ccc

echo aaa0.bbb.ccc | { IFS=. read a _ ; echo $a ; }
{ IFS=. read a _ ; echo $a ; } <<< aaa0.bbb.ccc 

x=aaa0.bbb.ccc; echo ${x/.*/}

Heavier options:

sed:
echo aaa0.bbb.ccc | sed 's/\..*//'
sed 's/\..*//' <<< aaa0.bbb.ccc 
awk:
echo aaa0.bbb.ccc | awk -F. '{print $1}'
awk -F. '{print $1}' <<< aaa0.bbb.ccc