I'm looking for a simple regular expression to match the same character being repeated more than 10 or so times. So for example, if I have a document littered with horizontal lines:
=================================================
It will match the line of =
characters because it is repeated more than 10 times. Note that I'd like this to work for any character.
This question is related to
regex
PHP's preg_replace
example:
$str = "motttherbb fffaaattther";
$str = preg_replace("/([a-z])\\1/", "", $str);
echo $str;
Here [a-z]
hits the character, ()
then allows it to be used with \\1
backreference which tries to match another same character (note this is targetting 2 consecutive characters already), thus:
mother father
If you did:
$str = preg_replace("/([a-z])\\1{2}/", "", $str);
that would be erasing 3 consecutive repeated characters, outputting:
moherbb her
={10,}
matches =
that is repeated 10 or more times.
.
matches any character. Used in conjunction with the curly braces already mentioned:
$: cat > test
========
============================
oo
ooooooooooooooooooooooo
$: grep -E '(.)\1{10}' test
============================
ooooooooooooooooooooooo
A slightly more generic powershell example. In powershell 7, the match is highlighted including the last space (can you highlight in stack?).
'a b c d e f ' | select-string '([a-f] ){6,}'
a b c d e f
use the {10,} operator:
$: cat > testre
============================
==
==============
$: grep -E '={10,}' testre
============================
==============
In Python you can use (.)\1{9,}
example:
txt = """1. aaaaaaaaaaaaaaa
2. bb
3. cccccccccccccccccccc
4. dd
5. eeeeeeeeeeee"""
rx = re.compile(r'(.)\1{9,}')
lines = txt.split('\n')
for line in lines:
rxx = rx.search(line)
if rxx:
print line
Output:
1. aaaaaaaaaaaaaaa
3. cccccccccccccccccccc
5. eeeeeeeeeeee
On some apps you need to remove the slashes to make it work.
/(.)\1{9,}/
or this:
(.)\1{9,}
You can also use PowerShell to quickly replace words or character reptitions. PowerShell is for Windows. Current version is 3.0.
$oldfile = "$env:windir\WindowsUpdate.log"
$newfile = "$env:temp\newfile.txt"
$text = (Get-Content -Path $oldfile -ReadCount 0) -join "`n"
$text -replace '/(.)\1{9,}/', ' ' | Set-Content -Path $newfile
Source: Stackoverflow.com