[grep] Count all occurrences of a string in lots of files with grep

I have a bunch of log files. I need to find out how many times a string occurs in all files.

grep -c string *

returns

...
file1:1
file2:0
file3:0
...

Using a pipe I was able to get only files that have one or more occurrences:

grep -c string * | grep -v :0

...
file4:5
file5:1
file6:2
...

How can I get only the combined count? (If it returns file4:5, file5:1, file6:2, I want to get back 8.)

This question is related to grep

The answer is


Here is a faster-than-grep AWK alternative way of doing this, which handles multiple matches of <url> per line, within a collection of XML files in a directory:

awk '/<url>/{m=gsub("<url>","");total+=m}END{print total}' some_directory/*.xml

This works well in cases where some XML files don't have line breaks.


grep -oh string * | wc -w

will count multiple occurrences in a line


The AWK solution which also handles file names including colons:

grep -c string * | sed -r 's/^.*://' | awk 'BEGIN{}{x+=$1}END{print x}'

Keep in mind that this method still does not find multiple occurrences of string on the same line.


You can use a simple grep to capture the number of occurrences effectively. I will use the -i option to make sure STRING/StrING/string get captured properly.

Command line that gives the files' name:

grep -oci string * | grep -v :0

Command line that removes the file names and prints 0 if there is a file without occurrences:

grep -ochi string *

Another oneliner using basic command line functions handling multiple occurences per line.

 cat * |sed s/string/\\\nstring\ /g |grep string |wc -l

Instead of using -c, just pipe it to wc -l.

grep string * | wc -l

This will list each occurrence on a single line and then count the number of lines.

This will miss instances where the string occurs 2+ times on one line, though.


Here is a faster-than-grep AWK alternative way of doing this, which handles multiple matches of <url> per line, within a collection of XML files in a directory:

awk '/<url>/{m=gsub("<url>","");total+=m}END{print total}' some_directory/*.xml

This works well in cases where some XML files don't have line breaks.


The AWK solution which also handles file names including colons:

grep -c string * | sed -r 's/^.*://' | awk 'BEGIN{}{x+=$1}END{print x}'

Keep in mind that this method still does not find multiple occurrences of string on the same line.


Something different than all the previous answers:

perl -lne '$count++ for m/<pattern>/g;END{print $count}' *

short recursive variant:

find . -type f -exec cat {} + | grep -c 'string'

This works for multiple occurrences per line:

grep -o string * | wc -l

cat * | grep -c string

One of the rare useful applications of cat.


You can add -R to search recursively (and avoid to use cat) and -I to ignore binary files.

grep -RIc string .

Another oneliner using basic command line functions handling multiple occurences per line.

 cat * |sed s/string/\\\nstring\ /g |grep string |wc -l

Obligatory AWK solution:

grep -c string * | awk 'BEGIN{FS=":"}{x+=$2}END{print x}'

Take care if your file names include ":" though.


Instead of using -c, just pipe it to wc -l.

grep string * | wc -l

This will list each occurrence on a single line and then count the number of lines.

This will miss instances where the string occurs 2+ times on one line, though.


grep -oh string * | wc -w

will count multiple occurrences in a line


This works for multiple occurrences per line:

grep -o string * | wc -l

short recursive variant:

find . -type f -exec cat {} + | grep -c 'string'

You can add -R to search recursively (and avoid to use cat) and -I to ignore binary files.

grep -RIc string .

You can use a simple grep to capture the number of occurrences effectively. I will use the -i option to make sure STRING/StrING/string get captured properly.

Command line that gives the files' name:

grep -oci string * | grep -v :0

Command line that removes the file names and prints 0 if there is a file without occurrences:

grep -ochi string *

cat * | grep -c string

One of the rare useful applications of cat.


Grep only solution which I tested with grep for windows:

grep -ro "pattern to find in files" "Directory to recursively search" | grep -c "pattern to find in files"

This solution will count all occurrences even if there are multiple on one line. -r recursively searches the directory, -o will "show only the part of a line matching PATTERN" -- this is what splits up multiple occurences on a single line and makes grep print each match on a new line; then pipe those newline-separated-results back into grep with -c to count the number of occurrences using the same pattern.


Instead of using -c, just pipe it to wc -l.

grep string * | wc -l

This will list each occurrence on a single line and then count the number of lines.

This will miss instances where the string occurs 2+ times on one line, though.


cat * | grep -c string

One of the rare useful applications of cat.


If you want number of occurrences per file (example for string "tcp"):

grep -RIci "tcp" . | awk -v FS=":" -v OFS="\t" '$2>0 { print $2, $1 }' | sort -hr

Example output:

53  ./HTTPClient/src/HTTPClient.cpp
21  ./WiFi/src/WiFiSTA.cpp
19  ./WiFi/src/ETH.cpp
13  ./WiFi/src/WiFiAP.cpp
4   ./WiFi/src/WiFiClient.cpp
4   ./HTTPClient/src/HTTPClient.h
3   ./WiFi/src/WiFiGeneric.cpp
2   ./WiFi/examples/WiFiClientBasic/WiFiClientBasic.ino
2   ./WiFiClientSecure/src/ssl_client.cpp
1   ./WiFi/src/WiFiServer.cpp

Explanation:

  • grep -RIci NEEDLE . - looks for string NEEDLE recursively from current directory (following symlinks), ignoring binaries, counting number of occurrences, ignoring case
  • awk ... - this command ignores files with zero occurrences and formats lines
  • sort -hr - sorts lines in reverse order by numbers in first column

Of course, it works with other grep commands with option -c (count) as well. For example:

grep -c "tcp" *.txt | awk -v FS=":" -v OFS="\t" '$2>0 { print $2, $1 }' | sort -hr

Grep only solution which I tested with grep for windows:

grep -ro "pattern to find in files" "Directory to recursively search" | grep -c "pattern to find in files"

This solution will count all occurrences even if there are multiple on one line. -r recursively searches the directory, -o will "show only the part of a line matching PATTERN" -- this is what splits up multiple occurences on a single line and makes grep print each match on a new line; then pipe those newline-separated-results back into grep with -c to count the number of occurrences using the same pattern.


Obligatory AWK solution:

grep -c string * | awk 'BEGIN{FS=":"}{x+=$2}END{print x}'

Take care if your file names include ":" though.


If you want number of occurrences per file (example for string "tcp"):

grep -RIci "tcp" . | awk -v FS=":" -v OFS="\t" '$2>0 { print $2, $1 }' | sort -hr

Example output:

53  ./HTTPClient/src/HTTPClient.cpp
21  ./WiFi/src/WiFiSTA.cpp
19  ./WiFi/src/ETH.cpp
13  ./WiFi/src/WiFiAP.cpp
4   ./WiFi/src/WiFiClient.cpp
4   ./HTTPClient/src/HTTPClient.h
3   ./WiFi/src/WiFiGeneric.cpp
2   ./WiFi/examples/WiFiClientBasic/WiFiClientBasic.ino
2   ./WiFiClientSecure/src/ssl_client.cpp
1   ./WiFi/src/WiFiServer.cpp

Explanation:

  • grep -RIci NEEDLE . - looks for string NEEDLE recursively from current directory (following symlinks), ignoring binaries, counting number of occurrences, ignoring case
  • awk ... - this command ignores files with zero occurrences and formats lines
  • sort -hr - sorts lines in reverse order by numbers in first column

Of course, it works with other grep commands with option -c (count) as well. For example:

grep -c "tcp" *.txt | awk -v FS=":" -v OFS="\t" '$2>0 { print $2, $1 }' | sort -hr

Something different than all the previous answers:

perl -lne '$count++ for m/<pattern>/g;END{print $count}' *