[bash] grep for multiple strings in file on different lines (ie. whole file, not line based search)?

I want to grep for files containing the words Dansk, Svenska or Norsk on any line, with a usable returncode (as I really only like to have the info that the strings are contained, my one-liner goes a little further then this).

I have many files with lines in them like this:

Disc Title: unknown
Title: 01, Length: 01:33:37.000 Chapters: 33, Cells: 31, Audio streams: 04, Subpictures: 20
        Subtitle: 01, Language: ar - Arabic, Content: Undefined, Stream id: 0x20, 
        Subtitle: 02, Language: bg - Bulgarian, Content: Undefined, Stream id: 0x21, 
        Subtitle: 03, Language: cs - Czech, Content: Undefined, Stream id: 0x22, 
        Subtitle: 04, Language: da - Dansk, Content: Undefined, Stream id: 0x23, 
        Subtitle: 05, Language: de - Deutsch, Content: Undefined, Stream id: 0x24, 
(...)

Here is the pseudocode of what I want:

for all files in directory;
 if file contains "Dansk" AND "Norsk" AND "Svenska" then
 then echo the filename
end

What is the best way to do this? Can it be done on one line?

This question is related to bash awk grep

The answer is


awk '/Dansk/{a=1}/Norsk/{b=1}/Svenska/{c=1}END{ if (a && b && c) print "0" }' 

you can then catch the return value with the shell

if you have Ruby(1.9+)

ruby -0777 -ne 'print if /Dansk/ and /Norsk/ and /Svenka/' file

This is a blending of glenn jackman's and kurumi's answers which allows an arbitrary number of regexes instead of an arbitrary number of fixed words or a fixed set of regexes.

#!/usr/bin/awk -f
# by Dennis Williamson - 2011-01-25

BEGIN {
    for (i=ARGC-2; i>=1; i--) {
        patterns[ARGV[i]] = 0;
        delete ARGV[i];
    }
}

{
    for (p in patterns)
        if ($0 ~ p)
            matches[p] = 1
            # print    # the matching line could be printed
}

END {
    for (p in patterns) {
        if (matches[p] != 1)
            exit 1
    }
}

Run it like this:

./multigrep.awk Dansk Norsk Svenska 'Language: .. - A.*c' dvdfile.dat

I did that with two steps. Make a list of csv files in one file With a help of this page comments I made two scriptless steps to get what I needed. Just type into terminal:

$ find /csv/file/dir -name '*.csv' > csv_list.txt
$ grep -q Svenska `cat csv_list.txt` && grep -q Norsk `cat csv_list.txt` && grep -l Dansk `cat csv_list.txt`

it did exactly what I needed - print file names containing all three words.

Also mind the symbols like `' "


Here's what worked well for me:

find . -path '*/.svn' -prune -o -type f -exec gawk '/Dansk/{a=1}/Norsk/{b=1}/Svenska/{c=1}END{ if (a && b && c) print FILENAME }' {} \;
./path/to/file1.sh
./another/path/to/file2.txt
./blah/foo.php

If I just wanted to find .sh files with these three, then I could have used:

find . -path '*/.svn' -prune -o -type f -name "*.sh" -exec gawk '/Dansk/{a=1}/Norsk/{b=1}/Svenska/{c=1}END{ if (a && b && c) print FILENAME }' {} \;
./path/to/file1.sh

How to grep for multiple strings in file on different lines (Use the pipe symbol):

for file in *;do 
   test $(grep -E 'Dansk|Norsk|Svenska' $file | wc -l) -ge 3 && echo $file
done

Notes:

  1. If you use double quotes "" with your grep, you will have to escape the pipe like this: \| to search for Dansk, Norsk and Svenska.

  2. Assumes that one line has only one language.

Walkthrough: http://www.cyberciti.biz/faq/howto-use-grep-command-in-linux-unix/


Yet another way using just bash and grep:

For a single file 'test.txt':

  grep -q Dansk test.txt && grep -q Norsk test.txt && grep -l Svenska test.txt

Will print test.txt iff the file contains all three (in any combination). The first two greps don't print anything (-q) and the last only prints the file if the other two have passed.

If you want to do it for every file in the directory:

   for f in *; do grep -q Dansk $f && grep -q Norsk $f && grep -l Svenska $f; done

You can use:

grep -l Dansk * | xargs grep -l Norsk | xargs grep -l Svenska

If you want also to find in hidden files:

grep -l Dansk .* | xargs grep -l Norsk | xargs grep -l Svenska

You can do this really easily with ack:

ack -l 'cats' | ack -xl 'dogs'
  • -l: return a list of files
  • -x: take the files from STDIN (the previous search) and only search those files

And you can just keep piping until you get just the files you want.


I had this problem today, and all one-liners here failed to me because the files contained spaces in the names.

This is what I came up with that worked:

grep -ril <WORD1> | sed 's/.*/"&"/' | xargs grep -il <WORD2>

If you have git installed

git grep -l --all-match --no-index -e Dansk -e Norsk -e Svenska

The --no-index searches files in the current directory that is not managed by Git. So this command will work in any directory irrespective of whether it is a git repository or not.


Expanding on @kurumi's awk answer, here's a bash function:

all_word_search() {
    gawk '
        BEGIN {
            for (i=ARGC-2; i>=1; i--) {
                search_terms[ARGV[i]] = 0;
                ARGV[i] = ARGV[i+1];
                delete ARGV[i+1];
            }
        }
        {
            for (i=1;i<=NF; i++) 
                if ($i in search_terms) 
                    search_terms[$1] = 1
        }
        END {
            for (word in search_terms) 
                if (search_terms[word] == 0) 
                    exit 1
        }
    ' "$@"
    return $?
}

Usage:

if all_word_search Dansk Norsk Svenska filename; then
    echo "all words found"
else
    echo "not all words found"
fi

Simply:

grep 'word1\|word2\|word3' *

see this post for more info


If you only need two search terms, arguably the most readable approach is to run each search and intersect the results:

 comm -12 <(grep -rl word1 . | sort) <(grep -rl word2 . | sort)

This searches multiple words in multiple files:

egrep 'abc|xyz' file1 file2 ..filen 

grep –irl word1 * | grep –il word2 `cat -` | grep –il word3 `cat -`
  • -i makes search case insensitive
  • -r makes file search recursive through folders
  • -l pipes the list of files with the word found
  • cat - causes the next grep to look through the files passed to it list.

Examples related to bash

Comparing a variable with a string python not working when redirecting from bash script Zipping a file in bash fails How do I prevent Conda from activating the base environment by default? Get first line of a shell command's output Fixing a systemd service 203/EXEC failure (no such file or directory) /bin/sh: apt-get: not found VSCode Change Default Terminal Run bash command on jenkins pipeline How to check if the docker engine and a docker container are running? How to switch Python versions in Terminal?

Examples related to awk

What are NR and FNR and what does "NR==FNR" imply? awk - concatenate two string variable and assign to a third Printing column separated by comma using Awk command line Insert multiple lines into a file after specified pattern using shell script cut or awk command to print first field of first row How to run an awk commands in Windows? Linux bash script to extract IP address Print line numbers starting at zero using awk Trim leading and trailing spaces from a string in awk Use awk to find average of a column

Examples related to grep

grep's at sign caught as whitespace cat, grep and cut - translated to python How to suppress binary file matching results in grep Linux find and grep command together Filtering JSON array using jQuery grep() Linux Script to check if process is running and act on the result grep without showing path/file:line How do you grep a file and get the next 5 lines How to grep, excluding some patterns? Fast way of finding lines in one file that are not in another?