AWK Access captured group from line pattern

Question

If I have an awk command  pattern           and pattern uses a capturing group  how can I access the string so captured in the block

User · Accepted Answer

That was a stroll down memory lane     I replaced awk by perl a long time ago   Apparently the AWK regular expression engine does not capture its groups   you might consider using something like    perl -n -e  test  d     amp  amp  print  1    the -n flag causes perl to loop over every line like awk does

User · Answer

With gawk  you can use the match function to capture parenthesized groups   gawk  match  0  pattern  ary   print ary 1       example   echo  abcdef    gawk  match  0   b    e   a   print a 1       outputs cd    Note the specific use of gawk which implements the feature in question   For a portable alternative you can achieve similar results with match   and substr   example   echo  abcdef    awk  match  0   b  e      print substr  0  RSTART 1  RLENGTH-1      outputs cd

User · Answer

This is something I need all the time so I created a bash function for it. It's based on glenn jackman's answer.

Definition

Add this to your .bash_profile etc.

function regex { gawk 'match($0,/'$1'/, ary) {print ary['${2:-'0'}']}'; }

Usage

Capture regex for each line in file

$ cat filename | regex '.*'

Capture 1st regex capture group for each line in file

$ cat filename | regex '(.*)' 1

User · Answer

You can simulate capturing in vanilla awk too, without extensions. Its not intuitive though:

step 1. use gensub to surround matches with some character that doesnt appear in your string. step 2. Use split against the character. step 3. Every other element in the splitted array is your capture group.

$ echo 'ab cb ad' | awk '{ split(gensub(/a./,SUBSEP"&"SUBSEP,"g",$0),cap,SUBSEP); print cap[2]"|" cap[4] ; }'
ab|ad

User · Answer

I struggled a bit with coming up with a bash function that wraps Peter Tillemans' answer but here's what I came up with:

function regex { perl -n -e "/$1/ && printf \"%s\n\", "'$1' }

I found this worked better than opsb's awk-based bash function for the following regular expression argument, because I do not want the "ms" to be printed.

'([0-9]*)ms$'

User · Answer

You can use GNU awk     cat hta RewriteCond   HTTP HOST    www  mysite  net  RewriteRule      http   www mysite net  1  R 301 L     gawk  match  0      http         m    print m 1       lt  hta http   www mysite net

[regex] AWK: Access captured group from line pattern

The answer is

Definition

Usage

Examples related to regex

Examples related to awk

Tags