What is a non-capturing group in regular expressions

Question

How are non-capturing groups  i e        used in regular expressions and what are they good for

User · Accepted Answer

Let me try to explain this with an example.

Consider the following text:

http://stackoverflow.com/
https://stackoverflow.com/questions/tagged/regex

Now, if I apply the regex below over it...

(https?|ftp)://([^/\r\n]+)(/[^\r\n]*)?

... I would get the following result:

Match "http://stackoverflow.com/"
     Group 1: "http"
     Group 2: "stackoverflow.com"
     Group 3: "/"

Match "https://stackoverflow.com/questions/tagged/regex"
     Group 1: "https"
     Group 2: "stackoverflow.com"
     Group 3: "/questions/tagged/regex"

But I don't care about the protocol -- I just want the host and path of the URL. So, I change the regex to include the non-capturing group (?:).

(?:https?|ftp)://([^/\r\n]+)(/[^\r\n]*)?

Now, my result looks like this:

Match "http://stackoverflow.com/"
     Group 1: "stackoverflow.com"
     Group 2: "/"

Match "https://stackoverflow.com/questions/tagged/regex"
     Group 1: "stackoverflow.com"
     Group 2: "/questions/tagged/regex"

See? The first group has not been captured. The parser uses it to match the text, but ignores it later, in the final result.

EDIT:

As requested, let me try to explain groups too.

Well, groups serve many purposes. They can help you to extract exact information from a bigger match (which can also be named), they let you rematch a previous matched group, and can be used for substitutions. Let's try some examples, shall we?

Imagine you have some kind of XML or HTML (be aware that regex may not be the best tool for the job, but it is nice as an example). You want to parse the tags, so you could do something like this (I have added spaces to make it easier to understand):

   \<(?<TAG>.+?)\> [^<]*? \</\k<TAG>\>
or
   \<(.+?)\> [^<]*? \</\1\>

The first regex has a named group (TAG), while the second one uses a common group. Both regexes do the same thing: they use the value from the first group (the name of the tag) to match the closing tag. The difference is that the first one uses the name to match the value, and the second one uses the group index (which starts at 1).

Let's try some substitutions now. Consider the following text:

Lorem ipsum dolor sit amet consectetuer feugiat fames malesuada pretium egestas.

Now, let's use this dumb regex over it:

\b(\S)(\S)(\S)(\S*)\b

This regex matches words with at least 3 characters, and uses groups to separate the first three letters. The result is this:

Match "Lorem"
     Group 1: "L"
     Group 2: "o"
     Group 3: "r"
     Group 4: "em"
Match "ipsum"
     Group 1: "i"
     Group 2: "p"
     Group 3: "s"
     Group 4: "um"
...

Match "consectetuer"
     Group 1: "c"
     Group 2: "o"
     Group 3: "n"
     Group 4: "sectetuer"
...

So, if we apply the substitution string:

$1_$3$2_$4

... over it, we are trying to use the first group, add an underscore, use the third group, then the second group, add another underscore, and then the fourth group. The resulting string would be like the one below.

L_ro_em i_sp_um d_lo_or s_ti_ a_em_t c_no_sectetuer f_ue_giat f_ma_es m_la_esuada p_er_tium e_eg_stas.

You can use named groups for substitutions too, using ${name}.

To play around with regexes, I recommend http://regex101.com/, which offers a good amount of details on how the regex works; it also offers a few regex engines to choose from.

User · Answer

tl dr non-capturing groups  as the name suggests are the parts of the regex that you do not want to be included in the match and    is a way to define a group as being non-capturing   Let s say you have an email address example example com  The following regex will create two groups  the id part and  example com part    p Alpha   a-z    example com   For simplicity s sake  we are extracting the whole domain name including the   character   Now let s say  you only need the id part of the address  What you want to do is to grab the first group of the match result  surrounded by    in the regex and the way to do this is to use the non-capturing group syntax  i e      So the regex   p Alpha   a-z      example com  will return just the id part of the email

User · Answer

You can use capturing groups to organize and parse an expression   A non-capturing group has the first benefit  but doesn t have the overhead of the second   You can still say a non-capturing group is optional  for example   Say you want to match numeric text  but some numbers could be written as 1st  2nd  3rd  4th      If you want to capture the numeric part  but not the  optional  suffix you can use a non-capturing group     0-9      st nd rd th     That will match numbers in the form 1  2  3    or in the form 1st  2nd  3rd     but it will only capture the numeric part

User · Answer

I cannot comment on the top answers to say this  I would like to add an explicit point which is only implied in the top answers   The non-capturing group         does not remove any characters from the original full match  it only reorganises the regex visually to the programmer    To access a specific part of the regex without defined extraneous characters you would always need to use  group  lt index gt

User · Answer

Open your Google Chrome devTools and then Console tab  and type this    Peace  match    w   w   w      Run it and you will see     Pea    P    e    a   index  0  input   Peace   groups  undefined    The JavaScript RegExp engine capture three groups  the items with indexes 1 2 3  Now  use non-capturing mark to see the result    Peace  match      w   w   w      The result is     Pea    e    a   index  0  input   Peace   groups  undefined    This is obvious what is non capturing group

User · Answer

It makes the group non-capturing  which means that the substring matched by that group will not be included in the list of captures  An example in ruby to illustrate the difference    abc  match            captures    gt    a   b    abc  match              captures    gt    b

User · Answer

is used when you want to group an expression  but you do not want to save it as a matched captured portion of the string   An example would be something to match an IP address        d 1 3     3  d 1 3     Note that I don t care about saving the first 3 octets  but the         grouping allows me to shorten the regex without incurring the overhead of capturing and storing a match

User · Answer

Let me try this with an example   Regex Code     animal        w      1 2  Search String   Line 1 - animal cat dog cat tiger dog  Line 2 - animal cat cat dog dog tiger  Line 3 - animal dog dog cat cat tiger     animal  --  Non-Captured Group 1       --  Non-Captured Group 2    w  --  Captured Group 1     --  Captured Group 2   1 --  result of captured group 1 i e In Line 1 is cat  In Line 2 is cat  In Line 3 is dog    2 --  result of captured group 2 i e comma      So in this code by giving  1 and  2 we recall or repeat the result of captured group 1 and 2 respectively later in the code   As per the order of code    animal  should be group 1 and       should be group 2 and continues    but by giving the    we make the match-group non captured  which do not count off in matched group  so the grouping number starts from the first captured group and not the non captured   so that the repetition of the result of match-group    animal  can t be called later in code   Hope this explains the use of non capturing group

User · Answer

I think I would give you the answer  Don t use capture variables without checking that the match succeeded   The capture variables   1  etc  are not valid unless the match succeeded  and they re not cleared  either      usr bin perl   use warnings  use strict           bronto saurus burger   if      bronto   saurus  steak burger          print  Fred wants a   1     else       print  Fred dont wants a  1  2       In the above example  to avoid capturing bronto in  1       is used   If the pattern is matched   then  1 is captured as next grouped pattern   So  the output will be as below   Fred wants a burger   It is Useful if you don t want the matches to be saved

User · Answer

Its extremely simple  We can understand with simple date example  suppose if the date is mentioned as 1st January 2019 or 2nd May 2019 or any other date and we simply want to convert it to dd mm yyyy format we would not need the month s name which is January or February for that matter  so in order to capture the numeric part  but not the  optional  suffix you can use a non-capturing group   so the regular expression would be     0-9      January February     Its as simple as that

User · Answer

Well I am a JavaScript developer and will try to explain its significance pertaining to JavaScript   Consider a scenario where you want to match cat is animal when you would like match cat and animal and both should have a is in between them       this will ignore  is  as that s is what we want  cat is animal  match   cat     is   animal      result   cat is animal    cat    animal        using lookahead pattern it will match only  cat  we can     use lookahead but the problem is we can not give anything     at the back of lookahead pattern  cat is animal  match  cat    is animal      result   cat       so I gave another grouping parenthesis for animal     in lookahead pattern to match animal as well  cat is animal  match   cat     is  animal       result   cat    cat    animal        we got extra cat in above example so removing another grouping  cat is animal  match  cat    is  animal       result   cat    animal

User · Answer

Groups that capture you can use later on in the regex to match OR you can use them in the replacement part of the regex   Making a non-capturing group simply exempts that group from being used for either of these reasons     Non-capturing groups are great if you are trying to capture many different things and there are some groups you don t want to capture    Thats pretty much the reason they exist   While you are learning about groups  learn about Atomic Groups  they do a lot   There is also lookaround groups but they are a little more complex and not used so much   Example of using later on in the regex  backreference     lt   A-Z  A-Z0-9    b   gt    gt     lt   1 gt     Finds an xml tag  without ns support       A-Z  A-Z0-9    is a capturing group  in this case it is the tagname   Later on in the regex is  1 which means it will only match the same text that was in the first group  the   A-Z  A-Z0-9    group   in this case it is matching the end tag

User · Answer

One interesting thing that I came across is the fact that you can have a capturing group inside a non-capturing group  Have a look at below regex for matching web urls   var parse url regex          A-Za-z         0 3    0-9  -A-Za-z         d                                                 Input url string   var url    http   www ora com 80 goodparts q fragment     The first group in my regex      A-Za-z      is a non-capturing group which matches the protocol scheme and colon   character i e  http  but when I was running below code  I was seeing the 1st index of the returned array was containing the string http when I was thinking that http and colon   both will not get reported as they are inside a non-capturing group   console debug parse url regex exec url        I thought if the first group      A-Za-z      is a non-capturing group then why it is returning http string in the output array   So if you notice that there is a nested group   A-Za-z    inside the non-capturing group  That nested group   A-Za-z    is a capturing group  not having    at the beginning  in itself inside a non-capturing group      A-Za-z       That s why the text http still gets captured but the colon   character which is inside the non-capturing group but outside the capturing group doesn t get reported in the output array

User · Answer

HISTORICAL MOTIVATION   The existence of non-capturing groups can be explained with the use of parenthesis   Consider the expressions  a b c and a bc  due to priority of concatenation over    these expressions represent two different languages   ac  bc  and  a  bc  respectively    However  the parenthesis are also used as a matching group  as explained by the other answers       When you want to have parenthesis but not capture the sub-expression you use NON-CAPTURING GROUPS  In the example     a b c

User · Answer

In complex regular expressions you may have the situation arise where you wish to use a large number of groups some of which are there for repetition matching and some of which are there to provide back references  By default the text matching each group is loaded into the backreference array  Where we have lots of groups and only need to be able to reference some of them from the backreference array we can override this default behaviour to tell the regular expression that certain groups are there only for repetition handling and do not need to be captured and stored in the backreference array

[regex] What is a non-capturing group in regular expressions?

EDIT:

Examples related to regex

Examples related to capturing-group

Examples related to regex-group