Regular Expression For Duplicate Words

Question

I m a regular expression newbie  and I can t quite figure out how to write a single regular expression that would  quot match quot  any duplicate consecutive words such as   Paris in the the spring  Not that that is related  Why are you laughing   Are my my regular expressions THAT bad    Is there a single regular expression that will match ALL of the bold strings above

User · Answer

This is the regex I use to remove duplicate phrases in my twitch bot:

(\S+\s*)\1{2,}

(\S+\s*) looks for any string of characters that isn't whitespace, followed whitespace.

\1{2,} then looks for more than 2 instances of that phrase in the string to match. If there are 3 phrases that are identical, it matches.

User · Answer

I believe this regex handles more situations      b S  b  s  b 1 b    A good selection of test strings can be found here   http   callumacrae github com regex-tuesday challenge1 html

User · Answer

Since some developers are coming to this page in search of a solution which not only eliminates duplicate consecutive non-whitespace substrings  but triplicates and beyond  I ll show the adapted pattern   Pattern     b S      s  1 b     Pattern Demo  Replace   1  replaces the fullstring match with capture group  1   This pattern greedily matches a  whole  non-whitespace substring  then requires one or more copies of the matched substring which may be delimited by one or more whitespace characters  space  tab  newline  etc    Specifically     b  word boundary  characters are vital to ensure partial words are not matched  The second parenthetical is a non-capturing group  because this variable width substring does not need to be captured -- only matched absorbed  the    one or more quantifier  on the non-capturing group is more appropriate than   because   will  bother  the regex engine to capture and replace singleton occurrences -- this is wasteful pattern design     note if you are dealing with sentences or input strings with punctuation  then the pattern will need to be further refined

User · Answer

Try this with below RE    b     start of word word boundary  W     any word character  1     same word matched already  b     end of word         Repeating again  public static void main String   args         String regex      b   w     b  W   b  1  b            Write a RegEx matching repeated words here           Pattern p   Pattern compile regex  Pattern CASE INSENSITIVE   Insert the correct Pattern flag here           Scanner in   new Scanner System in        int numSentences   Integer parseInt in nextLine          while  numSentences--  gt  0            String input   in nextLine             Matcher m   p matcher input               Check for subsequences of input that match the compiled pattern         while  m find                  input   input replaceAll m group 0  m group 1                          Prints the modified sentence          System out println input              in close

User · Answer

Use this in case you want case-insensitive checking for duplicate words     i   b   w    s   1  b

User · Answer

The widely-used PCRE library can handle such situations  you won t achieve the the same with POSIX-compliant regex engines  though      b w  b  W  1

User · Answer

The below expression should work correctly to find any number of consecutive words  The matching can be case insensitive   String regex      b   w     s   1  b     Pattern p   Pattern compile regex  Pattern CASE INSENSITIVE    Matcher m   p matcher input       Check for subsequences of input that match the compiled pattern while  m find           input   input replaceAll m group 0   m group 1        Sample Input   Goodbye goodbye GooDbYe  Sample Output   Goodbye  Explanation    The regex expression     b   Start of a word boundary   w    Any number of word characters    s  1 b     Any number of space followed by word which matches the previous word and ends the word boundary  Whole thing wrapped in   helps to find more than one repetitions   Grouping     m group 0    Shall contain the matched group in above case Goodbye goodbye GooDbYe  m group 1    Shall contain the first word of the matched pattern in above case Goodbye  Replace method shall replace all consecutive matched words with the first instance of the word

User · Answer

Regex to Strip 2  duplicate words  consecutive non-consecutive words  Try this regex that can catch 2 or more duplicates words and only leave behind one single word  And the duplicate words need not even be consecutive    b  w   b       b 1 b  ig  Here   b is used for Word Boundary     is used for positive lookahead  and  1 is used for back-referencing  Example Source

User · Answer

No  That is an irregular grammar  There may be engine- language-specific regular expressions that you can use  but there is no universal regular expression that can do that

User · Answer

Try this regular expression    b  w   s  1 b   Here  b is a word boundary and  1 references the captured match of the first group

User · Answer

This expression  inspired from Mike  above  seems to catch all duplicates  triplicates  etc  including the ones at the end of the string  which most of the others don t        s    S       s   2   g    1 2     I know the question asked to match duplicates only  but a triplicate is just 2 duplicates next to each other     First  I put     s   to make sure it starts with a full word  otherwise  child s steak  would go to  child steak   the  s  s would match   Then  it matches all full words    b S  b    followed by an end of string     or a number of spaces   s    the whole repeated more than once   I tried it like this and it worked well   var s    here here here     here is ahi-ahi ahi-ahi ahi-ahi joe s joe s joe s joe s joe s the result result     result   print  s replace     b S  b      s   1   g    1             -- gt  here is ahi-ahi joe s the result

User · Answer

The example in Javascript  The Good Parts can be adapted to do this   var doubled words      A-Za-z u00C0- u1FFF u2800- uFFFD    s  1    s    gi     b uses  w for word boundaries  where  w is equivalent to  0-9A-Z a-z   If you don t mind that limitation  the accepted answer is fine

User · Answer

Here is one that catches multiple words multiple times     b w  b   s  1

[regex] Regular Expression For Duplicate Words

Examples related to regex

Examples related to duplicates

Examples related to capture-group