How to find patterns across multiple lines using grep

Question

I want to find files that have  abc  AND  efg  in that order  and those two strings are on different lines in that file  Eg  a file with content   blah blah   blah blah   blah abc blah blah blah   blah blah   blah blah   blah efg blah blah blah blah   blah blah     Should be matched

User · Answer

you can use grep incase you are not keen in the sequence of the pattern.

grep -l "pattern1" filepattern*.* | xargs grep "pattern2"

example

grep -l "vector" *.cpp | xargs grep "map"

grep -l will find all the files which matches the first pattern, and xargs will grep for the second pattern. Hope this helps.

User · Answer

This should work too    perl -lpne  print  ARGV if  abc   efg s  file list    ARGV contains the name of the current file when reading from file list   s modifier searches across newline

User · Answer

This can be done easily by first using tr to replace the newlines with some other character   tr   n    a    grep -o  abc  def    tr   a    n    Here  I am using the alarm character   a  ASCII 7  in place of a newline  This is almost never found in your text  and grep can match it with a    or match it specifically with  a

User · Answer

Sadly  you can t   From the grep docs      grep  searches  the  named  input  FILEs  or standard input if no files are named  or if a single hyphen-minus  -  is given as file name  for lines containing a match to the given PATTERN

User · Answer

Why not something simple like   egrep -o  abc efg   file   grep -A1 abc   grep efg   wc -l   returns 0 or a positive integer   egrep -o  Only shows matches  trick  multiple matches on the same line produce multi-line output as if they are on different lines    grep -A1 abc  print abc and the line after it  grep efg   wc -l  0-n count of efg lines found after abc on the same or following lines  result can be used in an  if   grep can be changed to egrep etc  if pattern matching is needed

User · Answer

I released a grep alternative a few days ago that does support this directly, either via multiline matching or using conditions - hopefully it is useful for some people searching here. This is what the commands for the example would look like:

Multiline:

sift -lm 'abc.*efg' testfile

Conditions:

sift -l 'abc' testfile --followed-by 'efg'

You could also specify that 'efg' has to follow 'abc' within a certain number of lines:

sift -l 'abc' testfile --followed-within 5:'efg'

You can find more information on sift-tool.org.

User · Answer

You can do that very easily if you can use Perl.

perl -ne 'if (/abc/) { $abc = 1; next }; print "Found in $ARGV\n" if ($abc && /efg/); }' yourfilename.txt

You can do that with a single regular expression too, but that involves taking the entire contents of the file into a single string, which might end up taking up too much memory with large files. For completeness, here is that method:

perl -e '@lines = <>; $content = join("", @lines); print "Found in $ARGV\n" if ($content =~ /abc.*efg/s);' yourfilename.txt

User · Answer

I used this to extract a fasta sequence from a multi fasta file using the -P option for grep:

grep -Pzo ">tig00000034[^>]+"  file.fasta > desired_sequence.fasta

P for perl based searches
z for making a line end in 0 bytes rather than newline char
o to just capture what matched since grep returns the whole line (which in this case since you did -z is the whole file).

The core of the regexp is the [^>] which translates to "not greater than symbol"

User · Answer

I relied heavily on pcregrep  but with newer grep you do not need to install pcregrep for many of its features   Just use grep -P   In the example of the OP s question  I think the following options work nicely  with the second best matching how I understand the question   grep -Pzo  abc    n  efg   tmp tes  grep -Pzl  abc    n  efg   tmp tes    I copied the text as  tmp test1 and deleted the  g  and saved as  tmp test2   Here is the output showing that the first shows the matched string and the second shows only the filename  typical -o is to show match and typical -l is to show only filename    Note that the  z  is necessary for multiline and the      n   means to match either  anything other than newline  or  newline  - i e  anything   user host    grep -Pzo  abc    n  efg   tmp tes   tmp test1 abc blah blah blah   blah blah   blah blah   blah efg user host    grep -Pzl  abc    n  efg   tmp tes   tmp test1   To determine if your version is new enough  run man grep and see if something similar to this appears near the top      -P  --perl-regexp           Interpret  PATTERN  as a Perl regular expression  PCRE  see           below    This is highly experimental and grep -P may warn of           unimplemented features    That is from GNU grep 2 10

User · Answer

With ugrep released a few months ago   ugrep  abc  n     efg    This tool is highly optimized for speed  It s also GNU BSD PCRE-grep compatible   Note that we should use a lazy repetition     unless you want to match all lines with efg together until the last efg in the file

User · Answer

bin bash shopt -s nullglob for file in   do  r   awk   abc  f 1  efg  g 1 exit END print g amp  amp f  1 0   file   if     r  -eq 1   then    echo  Found pattern in  file   else    echo  not found   fi done

User · Answer

The filepattern   sh is important to prevent directories to be inspected  Of course some test could prevent that too   for f in   sh do   a    grep -n -m1 abc  f     test -n    a    amp  amp  z    grep -n efg  f   tail -n 1     continue           z     -  a         gt  0     amp  amp  echo  f done   The  grep -n -m1 abc  f    searches maximum 1 matching and returns  -n  the linenumber   If a match was found  test -n      find the last match of efg  find all and take the last with tail -n 1    z    grep -n efg  f   tail -n 1    else continue   Since the result is something like 18 foofile sh String alf  abc   we need to cut away from     till end of line       z     -  a          Should return a positive result if the last match of the 2nd expression is past the first match of the first    Then we report the filename echo  f

User · Answer

If you are willing to use contexts  this could be achieved by typing  grep -A 500 abc test txt   grep -B 500 efg   This will display everything between  abc  and  efg   as long as they are within 500 lines of each other

User · Answer

I don t know how I would do that with grep  but I would do something like this with awk   awk   abc  ln1 NR   efg  ln2 NR  END if ln1  amp  amp  ln2  amp  amp  ln1  lt  ln2  print  found  else print  not found     foo   You need to be careful how you do this  though  Do you want the regex to match the substring or the entire word  add  w tags as appropriate   Also  while this strictly conforms to how you stated the example  it doesn t quite work when abc appears a second time after efg  If you want to handle that  add an if as appropriate in the  abc  case etc

User · Answer

This should work   cat FILE   egrep  abc efg    If there is more than one match you can filter out using grep -v

User · Answer

To search recursively across all files  across multiple lines within each file  with BOTH strings present  i e  string1 and string2 on different lines and both present in same file   grep -r -l  string1     gt  tmp  while read p  do grep -l  string2   p  done  lt  tmp  rm tmp   To search recursively across all files  across multiple lines within each file  with EITHER string present  i e  string1 and string2 on different lines and either present in same file   grep -r -l  string1  string2

User · Answer

While the sed option is the simplest and easiest, LJ's one-liner is sadly not the most portable. Those stuck with a version of the C Shell will need to escape their bangs:

sed -e '/abc/,/efg/\!d' [file]

This unfortunately does not work in bash et al.

User · Answer

I m not sure if it is possible with grep  but sed makes it very easy   sed -e   abc   efg  d   file-with-content

User · Answer

With silver searcher   ag  abc    n    efg    similar to ring bearer s answer  but with ag instead  Speed advantages of silver searcher could possibly shine here

User · Answer

Grep is not sufficient for this operation   pcregrep which is found in most of the modern Linux systems can be used as  pcregrep -M   abc    n    efg  test txt   where -M  --multiline  allow patterns to match more than one line  There is a newer pcre2grep also  Both are provided by the PCRE project   pcre2grep is available for Mac OS X via Mac Ports as part of port pcre2     sudo port install pcre2    and via Homebrew as     brew install pcre   or for pcre2    brew install pcre2   pcre2grep is also available on Linux  Ubuntu 18 04      sudo apt install pcre2-utils   PCRE2   sudo apt install pcregrep      Older PCRE

User · Answer

If you have some estimation about the distance between the 2 strings 'abc' and 'efg' you are looking for, you might use:

grep -r . -e 'abc' -A num1 -B num2 | grep 'efg'

That way, the first grep will return the line with the 'abc' plus #num1 lines after it, and #num2 lines after it, and the second grep will sift through all of those to get the 'efg'. Then you'll know at which files they appear together.

User · Answer

awk one-liner   awk   abc   efg    file-with-content

User · Answer

As an alternative to Balu Mohan s answer  it is possible to enforce the order of the patterns using only grep  head and tail   for f in FILEGLOB  do tail  f -n    grep -n  pattern1   f   head -n1   cut -d   -f 1  2 gt  dev null   grep  pattern2   amp  gt  dev null  amp  amp  echo  f  done   This one isn t very pretty  though  Formatted more readably   for f in FILEGLOB  do     tail  f -n    grep -n  pattern1   f   head -n1   cut -d   -f 1  2 gt  dev null         grep -q  pattern2         amp  amp  echo  f done   This will print the names of all files where  pattern2  appears after  pattern1   or where both appear on the same line     echo  abc def   gt  a txt   echo  def abc   gt  b txt   echo  abcdef   gt  c txt  echo  defabc   gt  d txt   for f in   txt  do tail  f -n    grep -n  abc   f   head -n1   cut -d   -f 1  2 gt  dev null   grep -q  def   amp  amp  echo  f  done a txt c txt d txt   Explanation   tail -n  i - print all lines after the ith  inclusive grep -n - prepend matching lines with their line numbers head -n1 - print only the first row cut -d   -f 1 - print the first cut column using   as the delimiter 2 gt  dev null - silence tail error output that occurs if the     expression returns empty grep -q - silence grep and return immediately if a match is found  since we are only interested in the exit code

User · Answer

sed should suffice as poster LJ stated above    instead of  d you can simply use p to print    sed -n   abc   efg p  file

User · Answer

Here is a solution inspired by this answer:

if 'abc' and 'efg' can be on the same line:

  grep -zl 'abc.*efg' <your list of files>

if 'abc' and 'efg' must be on different lines:

  grep -Pzl '(?s)abc.*\n.*efg' <your list of files>

Params:

-P Use perl compatible regular expressions (PCRE).
-z Treat the input as a set of lines, each terminated by a zero byte instead of a newline. i.e. grep treats the input as a one big line.
-l list matching filenames only.
(?s) activate PCRE_DOTALL, which means that '.' finds any character or newline.

User · Answer

If you need both words are close each other, for example no more than 3 lines, you can do this:

find . -exec grep -Hn -C 3 "abc" {} \; | grep -C 3 "efg"

Same example but filtering only *.txt files:

find . -name *.txt -exec grep -Hn -C 3 "abc" {} \; | grep -C 3 "efg"

And also you can replace grep command with egrep command if you want also find with regular expressions.

[regex] How to find patterns across multiple lines using grep?

The answer is

Explanation

Examples related to regex

Examples related to grep

Tags