[regex] How to capture multiple repeated groups?

I need to capture multiple groups of the same pattern. Suppose, I have a following string:

HELLO,THERE,WORLD

And I've written a following pattern

^(?:([A-Z]+),?)+$

What I want it to do is, capture every single word, so that Group 1 is : "HELLO", Group 2 is "THERE" and Group 3 is "WORLD" What my regex is actually capturing only the last one, which is "WORLD".

I'm testing my regular expression here and I want to use it with Swift (maybe there's a way in Swift to get intermediate results somehow, so that I can use them?)

UPDATE: I don't want to use split. I just need to now how to capture all the groups that have matched the pattern, not only the last one.

The answer is


After reading Byte Commander's answer, I want to introduce a tiny possible improvement:

You can generate a regexp that will match either n words, as long as your n is predetermined. For instance, if I want to match between 1 and 3 words, the regexp:

^([A-Z]+)(?:,([A-Z]+))?(?:,([A-Z]+))?$

will match the next sentences, with one, two or three capturing groups.

HELLO,LITTLE,WORLD
HELLO,WORLD
HELLO

You can see a fully detailed explanation about this regular expression on Regex101.

As I said, it is pretty easy to generate this regexp for any groups you want using your favorite language. Since I'm not much of a swift guy, here's a ruby example:

def make_regexp(group_regexp, count: 3, delimiter: ",")
  regexp_str = "^(#{group_regexp})"
  (count - 1).times.each do
    regexp_str += "(?:#{delimiter}(#{group_regexp}))?"
  end
  regexp_str += "$"
  return regexp_str
end

puts make_regexp("[A-Z]+")

That being said, I'd suggest not using regular expression in that case, there are many other great tools from a simple split to some tokenization patterns depending on your needs. IMHO, a regular expression is not one of them. For instance in ruby I'd use something like str.split(",") or str.scan(/[A-Z]+/)


The key distinction is repeating a captured group instead of capturing a repeated group.

As you have already found out, the difference is that repeating a captured group captures only the last iteration. Capturing a repeated group captures all iterations.

In PCRE (PHP):

((?:\w+)+),?
Match 1, Group 1.    0-5      HELLO
Match 2, Group 1.    6-11     THERE
Match 3, Group 1.    12-20    BRUTALLY
Match 4, Group 1.    21-26    CRUEL
Match 5, Group 1.    27-32    WORLD

Since all captures are in Group 1, you only need $1 for substitution.

I used the following general form of this regular expression:

((?:{{RE}})+)

Example at regex101


You actually have one capture group that will match multiple times. Not multiple capture groups.

javascript (js) solution:

let string = "HI,THERE,TOM";
let myRegexp = /([A-Z]+),?/g;       //modify as you like
let match = myRegexp.exec(string);  //js function, output described below
while(match!=null){                 //loops through matches
    console.log(match[1]);          //do whatever you want with each match
    match = myRegexp.exec(bob);     //find next match
}

Output:

HI
THERE
TOM

Syntax:

// matched text: match[0]
// match start: match.index
// capturing group n: match[n]

As you can see, this will work for any number of matches.


I know that my answer came late but it happens to me today and I solved it with the following approach:

^(([A-Z]+),)+([A-Z]+)$

So the first group (([A-Z]+),)+ will match all the repeated patterns except the final one ([A-Z]+) that will match the final one. and this will be dynamic no matter how many repeated groups in the string.


Just to provide additional example of paragraph 2 in the answer. I'm not sure how critical it is for you to get three groups in one match rather than three matches using one group. E.g., in groovy:

def subject = "HELLO,THERE,WORLD"
def pat = "([A-Z]+)"
def m = (subject =~ pat)
m.eachWithIndex{ g,i ->
  println "Match #$i: ${g[1]}"
}

Match #0: HELLO
Match #1: THERE
Match #2: WORLD

I think you need something like this....

b="HELLO,THERE,WORLD"
re.findall('[\w]+',b)

Which in Python3 will return

['HELLO', 'THERE', 'WORLD']

Sorry, not Swift, just a proof of concept in the closest language at hand.

// JavaScript POC. Output:
// Matches:  ["GOODBYE","CRUEL","WORLD","IM","LEAVING","U","TODAY"]

let str = `GOODBYE,CRUEL,WORLD,IM,LEAVING,U,TODAY`
let matches = [];

function recurse(str, matches) {
    let regex = /^((,?([A-Z]+))+)$/gm
    let m
    while ((m = regex.exec(str)) !== null) {
        matches.unshift(m[3])
        return str.replace(m[2], '')
    }
    return "bzzt!"
}

while ((str = recurse(str, matches)) != "bzzt!") ;
console.log("Matches: ", JSON.stringify(matches))

Note: If you were really going to use this, you would use the position of the match as given by the regex match function, not a string replace.


Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to swift

Make a VStack fill the width of the screen in SwiftUI Xcode 10.2.1 Command PhaseScriptExecution failed with a nonzero exit code Command CompileSwift failed with a nonzero exit code in Xcode 10 Convert Json string to Json object in Swift 4 iOS Swift - Get the Current Local Time and Date Timestamp Xcode 9 Swift Language Version (SWIFT_VERSION) How do I use Safe Area Layout programmatically? How can I use String substring in Swift 4? 'substring(to:)' is deprecated: Please use String slicing subscript with a 'partial range from' operator Safe Area of Xcode 9 The use of Swift 3 @objc inference in Swift 4 mode is deprecated?

Examples related to nsregularexpression

How to capture multiple repeated groups?

Examples related to regex-greedy

How to capture multiple repeated groups? How can I write a regex which matches non greedy? Regex credit card number tests What is the difference between .*? and .* regular expressions? How to do a non-greedy match in grep? How to make Regular expression into non-greedy? What do 'lazy' and 'greedy' mean in the context of regular expressions? How can I make my match non greedy in vim? Non greedy (reluctant) regex matching in sed? Python non-greedy regexes

Examples related to regex-group

How to capture multiple repeated groups? Named regular expression group "(?P<group_name>regexp)": what does "P" stand for? RegEx to extract all matches from string using RegExp.exec What is a non-capturing group in regular expressions? Can I replace groups in Java regex? RegEx for matching UK Postcodes