[regex] How to print matched regex pattern using awk?

Using awk, I need to find a word in a file that matches a regex pattern.

I only want to print the word matched with the pattern.

So if in the line, I have:

xxx yyy zzz

And pattern:

/yyy/

I want to only get:

yyy

EDIT: thanks to kurumi i managed to write something like this:

awk '{
        for(i=1; i<=NF; i++) {
                tmp=match($i, /[0-9]..?.?[^A-Za-z0-9]/)
                if(tmp) {
                        print $i
                }
        }
}' $1

and this is what i needed :) thanks a lot!

This question is related to regex awk

The answer is


This is the very basic

awk '/pattern/{ print $0 }' file

ask awk to search for pattern using //, then print out the line, which by default is called a record, denoted by $0. At least read up the documentation.

If you only want to get print out the matched word.

awk '{for(i=1;i<=NF;i++){ if($i=="yyy"){print $i} } }' file

gawk can get the matching part of every line using this as action:

{ if (match($0,/your regexp/,m)) print m[0] }

match(string, regexp [, array]) If array is present, it is cleared, and then the zeroth element of array is set to the entire portion of string matched by regexp. If regexp contains parentheses, the integer-indexed elements of array are set to contain the portion of string matching the corresponding parenthesized subexpression. http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions


Off topic, this can be done using the grep also, just posting it here in case if anyone is looking for grep solution

echo 'xxx yyy zzze ' | grep -oE 'yyy'

If you are only interested in the last line of input and you expect to find only one match (for example a part of the summary line of a shell command), you can also try this very compact code, adopted from How to print regexp matches using `awk`?:

$ echo "xxx yyy zzz" | awk '{match($0,"yyy",a)}END{print a[0]}'
yyy

Or the more complex version with a partial result:

$ echo "xxx=a yyy=b zzz=c" | awk '{match($0,"yyy=([^ ]+)",a)}END{print a[1]}'
b

Warning: the awk match() function with three arguments only exists in gawk, not in mawk

Here is another nice solution using a lookbehind regex in grep instead of awk. This solution has lower requirements to your installation:

$ echo "xxx=a yyy=b zzz=c" | grep -Po '(?<=yyy=)[^ ]+'
b

It sounds like you are trying to emulate GNU's grep -o behaviour. This will do that providing you only want the first match on each line:

awk 'match($0, /regex/) {
    print substr($0, RSTART, RLENGTH)
}
' file

Here's an example, using GNU's awk implementation ():

awk 'match($0, /a.t/) {
    print substr($0, RSTART, RLENGTH)
}
' /usr/share/dict/words | head
act
act
act
act
aft
ant
apt
art
art
art

Read about match, substr, RSTART and RLENGTH in the awk manual.

After that you may wish to extend this to deal with multiple matches on the same line.


If Perl is an option, you can try this:

perl -lne 'print $1 if /(regex)/' file

To implement case-insensitive matching, add the i modifier

perl -lne 'print $1 if /(regex)/i' file

To print everything AFTER the match:

perl -lne 'if ($found){print} else{if (/regex(.*)/){print $1; $found++}}' textfile

To print the match and everything after the match:

perl -lne 'if ($found){print} else{if (/(regex.*)/){print $1; $found++}}' textfile

Using sed can also be elegant in this situation. Example (replace line with matched group "yyy" from line):

$ cat testfile
xxx yyy zzz
yyy xxx zzz
$ cat testfile | sed -r 's#^.*(yyy).*$#\1#g'
yyy
yyy

Relevant manual page: https://www.gnu.org/software/sed/manual/sed.html#Back_002dreferences-and-Subexpressions


If you know what column the text/pattern you're looking for (e.g. "yyy") is in, you can just check that specific column to see if it matches, and print it.

For example, given a file with the following contents, (called asdf.txt)

xxx yyy zzz

to only print the second column if it matches the pattern "yyy", you could do something like this:

awk '$2 ~ /yyy/ {print $2}' asdf.txt

Note that this will also match basically any line where the second column has a "yyy" in it, like these:

xxx yyyz zzz
xxx zyyyz