[java] Java Regex Capturing Groups

I am trying to understand this code block. In the first one, what is it we are looking for in the expression?

My understanding is that it is any character (0 or more times *) followed by any number between 0 and 9 (one or more times +) followed by any character (0 or more times *).

When this is executed the result is:

Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0

Could someone please go through this with me?

What is the advantage of using Capturing groups?

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexTut3 {

    public static void main(String args[]) {
        String line = "This order was placed for QT3000! OK?"; 
        String pattern = "(.*)(\\d+)(.*)";

        // Create a Pattern object
        Pattern r = Pattern.compile(pattern);

        // Now create matcher object.
        Matcher m = r.matcher(line);

        if (m.find()) {
            System.out.println("Found value: " + m.group(0));
            System.out.println("Found value: " + m.group(1));
            System.out.println("Found value: " + m.group(2));
        } else {
            System.out.println("NO MATCH");
        }
    }

}

This question is related to java regex

The answer is


From the doc :

Capturing groups</a> are indexed from left
 * to right, starting at one.  Group zero denotes the entire pattern, so
 * the expression m.group(0) is equivalent to m.group().

So capture group 0 send the whole line.


Your understanding is correct. However, if we walk through:

  • (.*) will swallow the whole string;
  • it will need to give back characters so that (\\d+) is satistifed (which is why 0 is captured, and not 3000);
  • the last (.*) will then capture the rest.

I am not sure what the original intent of the author was, however.


This is totally OK.

  1. The first group (m.group(0)) always captures the whole area that is covered by your regular expression. In this case, it's the whole string.
  2. Regular expressions are greedy by default, meaning that the first group captures as much as possible without violating the regex. The (.*)(\\d+) (the first part of your regex) covers the ...QT300 int the first group and the 0 in the second.
  3. You can quickly fix this by making the first group non-greedy: change (.*) to (.*?).

For more info on greedy vs. lazy, check this site.