[regex] Regular expression to match any character being repeated more than 10 times

I'm looking for a simple regular expression to match the same character being repeated more than 10 or so times. So for example, if I have a document littered with horizontal lines:

=================================================

It will match the line of = characters because it is repeated more than 10 times. Note that I'd like this to work for any character.

This question is related to regex

The answer is


PHP's preg_replace example:

$str = "motttherbb fffaaattther";
$str = preg_replace("/([a-z])\\1/", "", $str);
echo $str;

Here [a-z] hits the character, () then allows it to be used with \\1 backreference which tries to match another same character (note this is targetting 2 consecutive characters already), thus:

mother father

If you did:

$str = preg_replace("/([a-z])\\1{2}/", "", $str);

that would be erasing 3 consecutive repeated characters, outputting:

moherbb her


={10,}

matches = that is repeated 10 or more times.


. matches any character. Used in conjunction with the curly braces already mentioned:

$: cat > test
========
============================
oo
ooooooooooooooooooooooo


$: grep -E '(.)\1{10}' test
============================
ooooooooooooooooooooooo

A slightly more generic powershell example. In powershell 7, the match is highlighted including the last space (can you highlight in stack?).

'a b c d e f ' | select-string '([a-f] ){6,}'

a b c d e f 

use the {10,} operator:

$: cat > testre
============================
==
==============

$: grep -E '={10,}' testre
============================
==============

In Python you can use (.)\1{9,}

  • (.) makes group from one char (any char)
  • \1{9,} matches nine or more characters from 1st group

example:

txt = """1. aaaaaaaaaaaaaaa
2. bb
3. cccccccccccccccccccc
4. dd
5. eeeeeeeeeeee"""
rx = re.compile(r'(.)\1{9,}')
lines = txt.split('\n')
for line in lines:
    rxx = rx.search(line)
    if rxx:
        print line

Output:

1. aaaaaaaaaaaaaaa
3. cccccccccccccccccccc
5. eeeeeeeeeeee

On some apps you need to remove the slashes to make it work.

/(.)\1{9,}/

or this:

(.)\1{9,}

You can also use PowerShell to quickly replace words or character reptitions. PowerShell is for Windows. Current version is 3.0.

$oldfile = "$env:windir\WindowsUpdate.log"

$newfile = "$env:temp\newfile.txt"
$text = (Get-Content -Path $oldfile -ReadCount 0) -join "`n"

$text -replace '/(.)\1{9,}/', ' ' | Set-Content -Path $newfile