[regex] Match at every second occurrence

Is there a way to specify a regular expression to match every 2nd occurrence of a pattern in a string?

Examples

  • searching for a against string abcdabcd should find one occurrence at position 5
  • searching for ab against string abcdabcd should find one occurrence at position 5
  • searching for dab against string abcdabcd should find no occurrences
  • searching for a against string aaaa should find two occurrences at positions 2 and 4

This question is related to regex

The answer is


Use grouping.

foo.*?(foo)

If you're using C#, you can either get all the matches at once (i.e. use Regex.Matches(), which returns a MatchCollection, and check the index of the item: index % 2 != 0).

If you want to find the occurrence to replace it, use one of the overloads of Regex.Replace() that uses a MatchEvaluator (e.g. Regex.Replace(String, String, MatchEvaluator). Here's the code:

using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = "abcdabcd";

            // Replace *second* a with m

            string replacedString = Regex.Replace(
                input,
                "a",
                new SecondOccuranceFinder("m").MatchEvaluator);

            Console.WriteLine(replacedString);
            Console.Read();

        }

        class SecondOccuranceFinder
        {
            public SecondOccuranceFinder(string replaceWith)
            {
                _replaceWith = replaceWith;
                _matchEvaluator = new MatchEvaluator(IsSecondOccurance);
            }

            private string _replaceWith;

            private MatchEvaluator _matchEvaluator;
            public MatchEvaluator MatchEvaluator
            {
                get
                {
                    return _matchEvaluator;
                }
            }

            private int _matchIndex;
            public string IsSecondOccurance(Match m)
            {
                _matchIndex++;
                if (_matchIndex % 2 == 0)
                    return _replaceWith;
                else
                    return m.Value;
            }
        }
    }
}

There's no "direct" way of doing so but you can specify the pattern twice as in: a[^a]*a that match up to the second "a".

The alternative is to use your programming language (perl? C#? ...) to match the first occurence and then the second one.

EDIT: I've seen other responded using the "non-greedy" operators which might be a good way to go, assuming you have them in your regex library!


Back references can find interesting solutions here. This regex:

([a-z]+).*(\1)

will find the longest repeated sequence.

This one will find a sequence of 3 letters that is repeated:

([a-z]{3}).*(\1)

Would something like

(pattern.*?(pattern))*

work for you?

Edit:

The problem with this is that it uses the non-greedy operator *?, which can require an awful lot of backtracking along the string instead of just looking at each letter once. What this means for you is that this could be slow for large gaps.


Suppose the pattern you want is abc+d. You want to match the second occurrence of this pattern in a string.

You would construct the following regex:

abc+d.*?(abc+d)

This would match strings of the form: <your-pattern>...<your-pattern>. Since we're using the reluctant qualifier *? we're safe that there cannot be another match of between the two. Using matcher groups which pretty much all regex implementations provide you would then retrieve the string in the bracketed group which is what you want.