When searching for number of occurrences of a string in a file, I generally use:
grep pattern file | wc -l
However, this only finds one occurrence per line, because of the way grep works. How can I search for the number of times a string appears in a file, regardless of whether they are on the same or different lines?
Also, what if I'm searching for a regex pattern, not a simple string? How can I count those, or, even better, print each match on a new line?
Ripgrep, which is a fast alternative to grep, has just introduced the --count-matches
flag allowing counting each match in version 0.9 (I'm using the above example to stay consistent):
> echo afoobarfoobar | rg --count foo
1
> echo afoobarfoobar | rg --count-matches foo
2
As asked by OP, ripgrep allows for regex pattern as well (--regexp <PATTERN>
).
Also it can print each (line) match on a separate line:
> echo -e "line1foo\nline2afoobarfoobar" | rg foo
line1foo
line2afoobarfoobar
Try this:
grep "string to search for" FileNameToSearch | cut -d ":" -f 4 | sort -n | uniq -c
Sample:
grep "SMTP connect from unknown" maillog | cut -d ":" -f 4 | sort -n | uniq -c
6 SMTP connect from unknown [188.190.118.90]
54 SMTP connect from unknown [62.193.131.114]
3 SMTP connect from unknown [91.222.51.253]
A belated post:
Use the search regex pattern as a Record Separator (RS) in awk
This allows your regex to span \n
-delimited lines (if you need it).
printf 'X \n moo X\n XX\n' |
awk -vRS='X[^X]*X' 'END{print (NR<2?0:NR-1)}'
Hack grep's color function, and count how many color tags it prints out:
echo -e "a\nb b b\nc\ndef\nb e brb\nr" \
| GREP_COLOR="033" grep --color=always b \
| perl -e 'undef $/; $_=<>; s/\n//g; s/\x1b\x5b\x30\x33\x33/\n/g; print $_' \
| wc -l
Source: Stackoverflow.com