Regular expression to match balanced parentheses

Question

I need a regular expression to select all the text between two outer brackets   Example  some text text here possible text text possible text more text   end text  Result   text here possible text text possible text more text

User · Answer

I have written a little JavaScript library called balanced to help with this task. You can accomplish this by doing

balanced.matches({
    source: source,
    open: '(',
    close: ')'
});

You can even do replacements:

balanced.replacements({
    source: source,
    open: '(',
    close: ')',
    replace: function (source, head, tail) {
        return head + source + tail;
    }
});

Here's a more complex and interactive example JSFiddle.

User · Answer

lt                If you want to select text between two matching parentheses  you are out of luck with regular expressions  This is impossible      This regex just returns the text between the first opening and the last closing parentheses in your string         Unless your regex engine has features like balancing groups or recursion  The number of engines that support such features is slowly growing  but they are still not a commonly available

User · Answer

This answer explains the theoretical limitation of why regular expressions are not the right tool for this task     Regular expressions can not do this    Regular expressions are based on a computing model known as Finite State Automata  FSA   As the name indicates  a FSA can remember only the current state  it has no information about the previous states     In the above diagram  S1 and S2 are two states where S1 is the starting and final step  So if we try with the string 0110   the transition goes as follows         0     1     1     0 - gt  S1 - gt  S2 - gt  S2 - gt  S2 - gt S1   In the above steps  when we are at second S2 i e  after parsing 01 of 0110  the FSA has no information about the previous 0 in 01 as it can only remember the current state and the next input symbol   In the above problem  we need to know the no of opening parenthesis  this means it has to be stored at some place  But since FSAs can not do that  a regular expression can not be written   However  an algorithm can be written to do this task  Algorithms are generally falls under Pushdown Automata  PDA   PDA is one level above of FSA  PDA has an additional stack to store some additional information  PDAs can be used to solve the above problem  because we can  push  the opening parenthesis in the stack and  pop  them once we encounter a closing parenthesis  If at the end  stack is empty  then opening parenthesis and closing parenthesis matches  Otherwise not

User · Answer

Here is a simple python program showing how to use regular expressions to write a paren-matching recursive parser   This parser recognises items enclosed by parens  brackets  braces and  lt  gt  symbols  but is adaptable to any set of open close patterns   This is where the re package greatly assists in parsing        import re     The pattern below recognises a sequence consisting of       1  Any characters not in the set of open close strings       2  One of the open close strings       3  The remainder of the string       There is no reason the opening pattern can t be the   same as the closing pattern  so quoted strings can   be included   However quotes are not ignored inside   quotes   More logic is needed for that       pat   re compile                                                       lt      gt                                         BEGIN   END                           re X     The keys to the dictionary below are the opening strings    and the values are the corresponding closing strings    For example     is an opening string and     is its   closing string   matching                                                                               lt       gt                                                                  BEGIN     END       The procedure below matches string s and returns a   recursive list matching the nesting of the open close   patterns in s   def matchnested s  term          lst          while True          m   pat match s           if m group 1                     lst append m group 1            if m group 2     term              return lst  m group 3           if m group 2  in matching              item  s   matchnested m group 3   matching m group 2                lst append m group 2               lst append item              lst append matching m group 2            else              raise ValueError  After  lt  lt  s  s gt  gt  expected  s not  s                                  lst  s  term  m group 2       Unit test   if   name         main         for s in   simple string                      double quote                          single quote                      one two three four five six seven                  one two three four five six seven                  one two three four five six seven eight nine                  one two three four five six seven lt eight gt nine                  one two three four lt five gt six seven eight nine                  oneBEGINtwo threeBEGINfourENDfive sixENDseven                  ERROR testing     mismatched     parens            print   ninput   s         try              lst  s   matchnested s              print  output   lst         except ValueError as e              print str e      print  done

User · Answer

Adding to bobble bubble s answer  there are other regex flavors where recursive constructs are supported  Lua Use  b     b      b   for curly braces   square brackets    for s in string gmatch  quot Extract  a b c  and   d f g   quot    quot  b   quot   do print s  end  see demo   Raku  former Perl6   Non-overlapping multiple balanced parentheses matches  my regex paren any                lt -     gt       lt  amp paren any gt       say  quot Extract  a b c  and   d f g   quot     m g  lt  amp paren any gt        gt     a b c      d f g      Overlapping multiple balanced parentheses matches  say  quot Extract  a b c  and   d f g   quot     m ov g  lt  amp paren any gt        gt     a b c     b      d f g      d     g     See demo  Python re non-regex solution See poke s answer for How to get an expression between balanced parentheses  Java customizable non-regex solution Here is a customizable solution allowing single character literal delimiters in Java  public static List lt String gt  getBalancedSubstrings String s  Character markStart                                    Character markEnd  Boolean includeMarkers              List lt String gt  subTreeList   new ArrayList lt String gt             int level   0          int lastOpenDelimiter   -1          for  int i   0  i  lt  s length    i                  char c   s charAt i               if  c    markStart                    level                    if  level    1                        lastOpenDelimiter    includeMarkers   i   i   1                                               else if  c    markEnd                    if  level    1                        subTreeList add s substring lastOpenDelimiter   includeMarkers   i   1   i                                       if  level  gt  0  level--                                  return subTreeList           Sample usage  String s    quot some text text here possible text text possible text more text   end text quot   List lt String gt  balanced   getBalancedSubstrings s            true   System out println  quot Balanced substrings  n quot    balanced        gt    text here possible text text possible text more text

User · Answer

It is actually possible to do it using  NET regular expressions  but it is not trivial  so read carefully     You can read a nice article here  You also may need to read up on  NET regular expressions  You can start reading here   Angle brackets  lt  gt  were used because they do not require escaping   The regular expression looks like this    lt     lt  gt                       lt Open gt  lt              lt  gt                            lt Close-Open gt  gt              lt  gt                 Open        gt

User · Answer

This is the definitive regex         lt arguments gt                                                                  Example   input    arg1  arg2  arg3   arg4     pip     output  arg1  arg2  arg3   arg4     pip    note that the   pip  is correctly managed as string   tried in regulator  http   sourceforge net projects regulator

User · Answer

The regular expression using Ruby  version 1 9 3 or above        lt match gt       g lt match gt                  Demo on rubular

User · Answer

I didn t use regex since it is difficult to deal with nested code  So this snippet should be able to allow you to grab sections of code with balanced brackets   def extract code data           returns an array of code snippets from a string  data         start pos   None     end pos   None     count open   0     count close   0     code snippets          for i v in enumerate data           if v                    count open  1             if not start pos                  start pos  i         if v                   count close   1             if count open    count close and not end pos                  end pos   i 1         if start pos and end pos              code snippets append  start pos end pos               start pos   None             end pos   None      return code snippets    I used this to extract code snippets from a text file

User · Answer

I want to add this answer for quickreference  Feel free to update      NET Regex using balancing groups        gt      lt c gt               lt -c gt       c           Where c is used as the depth counter   Demo at Regexstorm com   Stack Overflow  Using RegEx to balance match parenthesis Wes  Puzzling Blog  Matching Balanced Constructs with  NET Regular Expressions Greg Reinacker s Weblog  Nested Constructs in Regular Expressions     PCRE using a recursive pattern                 R         Demo at regex101  Or without alternation                R          Demo at regex101  Or unrolled for performance                 R               Demo at regex101  The pattern is pasted at   R  which represents   0    Perl  PHP  Notepad    R  perl TRUE  Python  Regex package with   V1  for Perl behaviour     Ruby using subexpression calls   With Ruby 2 0  g lt 0 gt  can be used to call full pattern        gt         g lt 0 gt        Demo at Rubular  Ruby 1 9 only supports capturing group recursion         gt         g lt 1 gt         Demo at Rubular  nbsp  atomic grouping since Ruby 1 9 3     JavaScript  nbsp API    XRegExp matchRecursive  XRegExp matchRecursive str                 g      JS  Java and other regex flavors without recursion up to 2 levels of nesting                                                Demo at regex101  Deeper nesting needs to be added to pattern  To fail faster on unbalanced parenthesis drop the   quantifier     Java  An interesting idea using forward references by  jaytea     Reference - What does this regex mean    rexegg com - Recursive Regular Expressions Regular-Expressions info - Regular Expression Recursion

User · Answer

because js regex doesn t support recursive match  i can t make balanced parentheses matching work   so this is a simple javascript for loop version that make  method arg   string into array  push number  map test a a      bass wow  abc     groups  filter   type   ORGANIZATION   isDisabled     ne  true      pickBy  id  type  map test    as groups    const parser   str   gt      let ops        let method  arg   let isMethod   true   let open         for  const char of str           skip whitespace     if  char          continue         append method or arg string     if  char          amp  amp  char                  if  isMethod             method    method    char     method   char           else            arg    arg    char     arg   char                      if  char                     nested parenthesis should be a part of arg       if   isMethod  arg    char       isMethod   false       open push char        else if  char                  open pop            check end of arg       if  open length  lt  1            isMethod   true         ops push   method  arg            method   arg   undefined         else           arg    char                      return ops       const test   parser     groups  filter   type   ORGANIZATION   isDisabled     ne  true      pickBy  id  type  map test    as groups    const test   parser  push number  map test a a      bass wow  abc     console log test     the result is like      method   push   arg   number         method   map   arg   test a a             method   bass   arg   wow abc            method        arg   groups         method   filter       arg    type   ORGANIZATION   isDisabled   ne true           method   pickBy   arg    id type         method   map   arg   test           method   as   arg   groups

User · Answer

This do not fully address the OP question but I though it may be useful to some coming here to search for nested structure regexp  Parse parmeters from function string  with nested structures  in javascript Match structures like    matches brackets  square brackets  parentheses  single and double quotes  Here you can see generated regexp in action        get param content of function string     only params string should be provided without parentheses    WORK even if some all params are not set     return  param1  param2  param3      exports getParamsSAFE    str  nbParams   3    gt        const nextParamReg      s           quot            quot               quot            quot               quot         quot              quot            quot            quot                 s               const params           while  str length       this is to avoid a BIG performance issue in javascript regexp engine         str   str replace nextParamReg   full  p1    gt                params push p1               return                           return params

User · Answer

You can use regex recursion              R

User · Answer

While so many answers mention this in some form by saying that regex does not support recursive matching and so on  the primary reason for this lies in the roots of the Theory of Computation   Language of the form  a nb n   n gt  0  is not regular  Regex can only match things that form part of the regular set of languages   Read more   here

User · Answer

This one also worked  re findall r          s

User · Answer

I was also stuck in this situation where nested patterns comes    Regular Expression is right thing to solve the above problem  Use below pattern          gt          1

User · Answer

Regular expressions are the wrong tool for the job because you are dealing with nested structures  i e  recursion   But there is a simple algorithm to do this  which I described in this answer to a previous question

User · Answer

matches everything that isn t an opening bracket at the beginning of the string           captures the required substring enclosed in brackets  and        matches everything that isn t a closing bracket at the end of the string  Note that this expression does not attempt to match brackets  a simple parser  see dehmann s answer  would be more suitable for that

User · Answer

You need the first and last parentheses  Use something like this   str indexOf       - it will give you first occurrence  str lastIndexOf       - last one  So you need a string between    String searchedString   str substring str1 indexOf      str1 lastIndexOf

User · Answer

This might help to match balanced parenthesis   s  w             s

User · Answer

The answer depends on whether you need to match matching sets of brackets  or merely the first open to the last close in the input text   If you need to match matching nested brackets  then you need something more than regular expressions  - see  dehmann  If it s just first open to last close see  Zach  Decide what you want to happen with   abc   123   foobar   def   xyz   ghij   You need to decide what your code needs to match in this case

[regex] Regular expression to match balanced parentheses

Examples related to regex