I have a bunch of log files. I need to find out how many times a string occurs in all files.
grep -c string *
returns
...
file1:1
file2:0
file3:0
...
Using a pipe I was able to get only files that have one or more occurrences:
grep -c string * | grep -v :0
...
file4:5
file5:1
file6:2
...
How can I get only the combined count? (If it returns file4:5, file5:1, file6:2
, I want to get back 8.)
This question is related to
grep
Here is a faster-than-grep AWK alternative way of doing this, which handles multiple matches of <url>
per line, within a collection of XML files in a directory:
awk '/<url>/{m=gsub("<url>","");total+=m}END{print total}' some_directory/*.xml
This works well in cases where some XML files don't have line breaks.
grep -oh string * | wc -w
will count multiple occurrences in a line
The AWK solution which also handles file names including colons:
grep -c string * | sed -r 's/^.*://' | awk 'BEGIN{}{x+=$1}END{print x}'
Keep in mind that this method still does not find multiple occurrences of string
on the same line.
You can use a simple grep
to capture the number of occurrences effectively. I will use the -i
option to make sure STRING/StrING/string
get captured properly.
Command line that gives the files' name:
grep -oci string * | grep -v :0
Command line that removes the file names and prints 0 if there is a file without occurrences:
grep -ochi string *
Another oneliner using basic command line functions handling multiple occurences per line.
cat * |sed s/string/\\\nstring\ /g |grep string |wc -l
Instead of using -c, just pipe it to wc -l.
grep string * | wc -l
This will list each occurrence on a single line and then count the number of lines.
This will miss instances where the string occurs 2+ times on one line, though.
Here is a faster-than-grep AWK alternative way of doing this, which handles multiple matches of <url>
per line, within a collection of XML files in a directory:
awk '/<url>/{m=gsub("<url>","");total+=m}END{print total}' some_directory/*.xml
This works well in cases where some XML files don't have line breaks.
The AWK solution which also handles file names including colons:
grep -c string * | sed -r 's/^.*://' | awk 'BEGIN{}{x+=$1}END{print x}'
Keep in mind that this method still does not find multiple occurrences of string
on the same line.
Something different than all the previous answers:
perl -lne '$count++ for m/<pattern>/g;END{print $count}' *
short recursive variant:
find . -type f -exec cat {} + | grep -c 'string'
This works for multiple occurrences per line:
grep -o string * | wc -l
cat * | grep -c string
One of the rare useful applications of cat
.
You can add -R
to search recursively (and avoid to use cat) and -I
to ignore binary files.
grep -RIc string .
Another oneliner using basic command line functions handling multiple occurences per line.
cat * |sed s/string/\\\nstring\ /g |grep string |wc -l
Obligatory AWK solution:
grep -c string * | awk 'BEGIN{FS=":"}{x+=$2}END{print x}'
Take care if your file names include ":" though.
Instead of using -c, just pipe it to wc -l.
grep string * | wc -l
This will list each occurrence on a single line and then count the number of lines.
This will miss instances where the string occurs 2+ times on one line, though.
grep -oh string * | wc -w
will count multiple occurrences in a line
This works for multiple occurrences per line:
grep -o string * | wc -l
short recursive variant:
find . -type f -exec cat {} + | grep -c 'string'
You can add -R
to search recursively (and avoid to use cat) and -I
to ignore binary files.
grep -RIc string .
You can use a simple grep
to capture the number of occurrences effectively. I will use the -i
option to make sure STRING/StrING/string
get captured properly.
Command line that gives the files' name:
grep -oci string * | grep -v :0
Command line that removes the file names and prints 0 if there is a file without occurrences:
grep -ochi string *
cat * | grep -c string
One of the rare useful applications of cat
.
Grep only solution which I tested with grep for windows:
grep -ro "pattern to find in files" "Directory to recursively search" | grep -c "pattern to find in files"
This solution will count all occurrences even if there are multiple on one line. -r
recursively searches the directory, -o
will "show only the part of a line matching PATTERN" -- this is what splits up multiple occurences on a single line and makes grep print each match on a new line; then pipe those newline-separated-results back into grep with -c
to count the number of occurrences using the same pattern.
Instead of using -c, just pipe it to wc -l.
grep string * | wc -l
This will list each occurrence on a single line and then count the number of lines.
This will miss instances where the string occurs 2+ times on one line, though.
cat * | grep -c string
One of the rare useful applications of cat
.
If you want number of occurrences per file (example for string "tcp"):
grep -RIci "tcp" . | awk -v FS=":" -v OFS="\t" '$2>0 { print $2, $1 }' | sort -hr
Example output:
53 ./HTTPClient/src/HTTPClient.cpp
21 ./WiFi/src/WiFiSTA.cpp
19 ./WiFi/src/ETH.cpp
13 ./WiFi/src/WiFiAP.cpp
4 ./WiFi/src/WiFiClient.cpp
4 ./HTTPClient/src/HTTPClient.h
3 ./WiFi/src/WiFiGeneric.cpp
2 ./WiFi/examples/WiFiClientBasic/WiFiClientBasic.ino
2 ./WiFiClientSecure/src/ssl_client.cpp
1 ./WiFi/src/WiFiServer.cpp
Explanation:
grep -RIci NEEDLE .
- looks for string NEEDLE recursively from current directory (following symlinks), ignoring binaries, counting number of occurrences, ignoring caseawk ...
- this command ignores files with zero occurrences and formats linessort -hr
- sorts lines in reverse order by numbers in first columnOf course, it works with other grep commands with option -c
(count) as well. For example:
grep -c "tcp" *.txt | awk -v FS=":" -v OFS="\t" '$2>0 { print $2, $1 }' | sort -hr
Grep only solution which I tested with grep for windows:
grep -ro "pattern to find in files" "Directory to recursively search" | grep -c "pattern to find in files"
This solution will count all occurrences even if there are multiple on one line. -r
recursively searches the directory, -o
will "show only the part of a line matching PATTERN" -- this is what splits up multiple occurences on a single line and makes grep print each match on a new line; then pipe those newline-separated-results back into grep with -c
to count the number of occurrences using the same pattern.
Obligatory AWK solution:
grep -c string * | awk 'BEGIN{FS=":"}{x+=$2}END{print x}'
Take care if your file names include ":" though.
If you want number of occurrences per file (example for string "tcp"):
grep -RIci "tcp" . | awk -v FS=":" -v OFS="\t" '$2>0 { print $2, $1 }' | sort -hr
Example output:
53 ./HTTPClient/src/HTTPClient.cpp
21 ./WiFi/src/WiFiSTA.cpp
19 ./WiFi/src/ETH.cpp
13 ./WiFi/src/WiFiAP.cpp
4 ./WiFi/src/WiFiClient.cpp
4 ./HTTPClient/src/HTTPClient.h
3 ./WiFi/src/WiFiGeneric.cpp
2 ./WiFi/examples/WiFiClientBasic/WiFiClientBasic.ino
2 ./WiFiClientSecure/src/ssl_client.cpp
1 ./WiFi/src/WiFiServer.cpp
Explanation:
grep -RIci NEEDLE .
- looks for string NEEDLE recursively from current directory (following symlinks), ignoring binaries, counting number of occurrences, ignoring caseawk ...
- this command ignores files with zero occurrences and formats linessort -hr
- sorts lines in reverse order by numbers in first columnOf course, it works with other grep commands with option -c
(count) as well. For example:
grep -c "tcp" *.txt | awk -v FS=":" -v OFS="\t" '$2>0 { print $2, $1 }' | sort -hr
Something different than all the previous answers:
perl -lne '$count++ for m/<pattern>/g;END{print $count}' *
Source: Stackoverflow.com