Is there a regular expression to detect a valid regular expression

Question

Is it possible to detect a valid regular expression with another regular expression   If so please give example code below

User · Answer

Good question    True regular languages can not decide arbitrarily deeply nested well-formed parenthesis  If your alphabet contains     and     the goal is to decide if a string of these has well-formed matching parenthesis  Since this is a necessary requirement for regular expressions the answer is no   However  if you loosen the requirement and add recursion you can probably do it  The reason is that the recursion can act as a stack letting you  count  the current nesting depth by pushing onto this stack   Russ Cox wrote  Regular Expression Matching Can Be Simple And Fast  which is a wonderful treatise on regex engine implementation

User · Answer

Unlikely   Evaluate it in a try  catch or whatever your language provides

User · Answer

No  if you use standard regular expressions   The reason is that you cannot satisfy the pumping lemma for regular languages  The pumping lemma states that a string belonging to language  L  is regular if there exists a number  N  such that  after dividing the string into three substrings x  y  z  such that  x  gt  1  amp  amp   xy  lt  N  you can repeat y as many times as you want and the entire string will still belong to L   A consequence of the pumping lemma is that you cannot have regular strings in the form a Nb Mc N  that is  two substrings having the same length separated by another string  In any way you split such strings in x  y and z  you cannot  pump  y without obtaining a string with a different number of  a  and  c   thus leaving the original language  That s the case  for example  with parentheses in regular expressions

User · Answer

The following example by Paul McGuire  originally from the pyparsing wiki  but now available only through the Wayback Machine  gives a grammar for parsing some regexes  for the purposes of returning the set of matching strings   As such  it rejects those re s that include unbounded repetition terms  like     and       But it should give you an idea about how to structure a parser that would process re s        invRegex py     Copyright 2008  Paul McGuire     pyparsing script to expand a regular expression into all possible matching strings   Supports    -  n  and  m n  repetition  but not unbounded   or   repetition   -   optional elements   -    character ranges   -    grouping   -   alternation     all       count   invert    from pyparsing import  Literal  oneOf  printables  ParserElement  Combine       SkipTo  operatorPrecedence  ParseFatalException  Word  nums  opAssoc      Suppress  ParseResults  srange   class CharacterRangeEmitter object       def   init   self chars             remove duplicate chars in character range  but preserve original order         seen   set           self charset      join  seen add c  or c for c in chars if c not in seen       def   str   self           return     self charset         def   repr   self           return     self charset         def makeGenerator self           def genChars                for s in self charset                  yield s         return genChars  class OptionalEmitter object       def   init   self expr           self expr   expr     def makeGenerator self           def optionalGen                yield                for s in self expr makeGenerator                      yield s         return optionalGen  class DotEmitter object       def makeGenerator self           def dotGen                for c in printables                  yield c         return dotGen  class GroupEmitter object       def   init   self exprs           self exprs   ParseResults exprs      def makeGenerator self           def groupGen                def recurseList elist                   if len elist   1                      for s in elist 0  makeGenerator                              yield s                 else                      for s in elist 0  makeGenerator                              for s2 in recurseList elist 1                                 yield s   s2             if self exprs                  for s in recurseList self exprs                       yield s         return groupGen  class AlternativeEmitter object       def   init   self exprs           self exprs   exprs     def makeGenerator self           def altGen                for e in self exprs                  for s in e makeGenerator                          yield s         return altGen  class LiteralEmitter object       def   init   self lit           self lit   lit     def   str   self           return  Lit   self lit     def   repr   self           return  Lit   self lit     def makeGenerator self           def litGen                yield self lit         return litGen  def handleRange toks       return CharacterRangeEmitter srange toks 0     def handleRepetition toks       toks toks 0      if toks 1  in               raise ParseFatalException    0  unbounded repetition operators not supported       if toks 1                  return OptionalEmitter toks 0       if  count  in toks          return GroupEmitter  toks 0     int toks count       if  minCount  in toks          mincount   int toks minCount          maxcount   int toks maxCount          optcount   maxcount - mincount         if optcount              opt   OptionalEmitter toks 0               for i in range 1 optcount                   opt   OptionalEmitter GroupEmitter  toks 0  opt                return GroupEmitter  toks 0     mincount    opt           else              return  toks 0     mincount  def handleLiteral toks       lit          for t in toks          if t 0                       if t 1      t                   lit      t              else                  lit    t 1          else              lit    t     return LiteralEmitter lit       def handleMacro toks       macroChar   toks 0  1      if macroChar     d           return CharacterRangeEmitter  0123456789       elif macroChar     w           return CharacterRangeEmitter srange   A-Za-z0-9          elif macroChar     s           return LiteralEmitter          else          raise ParseFatalException    0  unsupported macro character      macroChar         def handleSequence toks       return GroupEmitter toks 0    def handleDot        return CharacterRangeEmitter printables   def handleAlternative toks       return AlternativeEmitter toks 0      parser   None def parser        global  parser     if  parser is None          ParserElement setDefaultWhitespaceChars             lbrack rbrack lbrace rbrace lparen rparen   map Literal                    reMacro   Combine        oneOf list  dws             escapedChar    reMacro   Combine        oneOf list printables            reLiteralChar      join c for c in printables if c not in r                     t           reRange   Combine lbrack   SkipTo rbrack ignore escapedChar    rbrack          reLiteral     escapedChar   oneOf list reLiteralChar             reDot   Literal              repetition                   lbrace   Word nums  setResultsName  count     rbrace                   lbrace   Word nums  setResultsName  minCount        Word nums  setResultsName  maxCount     rbrace                 oneOf list                                 reRange setParseAction handleRange          reLiteral setParseAction handleLiteral          reMacro setParseAction handleMacro          reDot setParseAction handleDot           reTerm     reLiteral   reRange   reMacro   reDot           reExpr   operatorPrecedence  reTerm                             repetition  1  opAssoc LEFT  handleRepetition                None  2  opAssoc LEFT  handleSequence                Suppress       2  opAssoc LEFT  handleAlternative                                        parser   reExpr      return  parser  def count gen          Simple function to count the number of elements returned by a generator         i   0     for s in gen          i    1     return i  def invert regex          Call this routine as a generator to return all the strings that        match the input regular expression             for s in invert   A-Z  3  d 3                    print s             invReGenerator   GroupEmitter parser   parseString regex   makeGenerator       return invReGenerator    def main        tests   r         A-EA       A-D        A-D  3      X A-C  3 Y     X A-C  3        X d     foobar d d     foobar 2      foobar 2 9      fooba rz  2       foobar  2        01  d   2 0-5         01  d d   2 0-4  d   25 0-5        A-C  1 2       A-C  0 3       A-C  s A-C  s A-C       A-C  s  A-C  A-C       A-C  s  A-C  A-C        A-C  s  A-C  A-C         A-C  2  d 2        TH 12          TH 12            TH 12  AL 12  SP 123  TB 1 0-9   20   3-9             TH 12  AL 12  SP 123  TB 1 0-9   20   3-9   OH 1 0-9   2 0-9   30   4-9            ECMP  HA AK  SD  HS T      A-CV  2      A cglmrstu  B aehikr   C adeflmorsu   D bsy  E rsu  F emr   G ade  H efgos   I nr   Kr  L airu  M dgnot  N abdeiop   Os  P abdmortu   R abefghnu  S bcegimnr   T abcehilm  Uu bhopqst  U V W Xe Yb  Z nr       a b   x y       a b   x y          split   n        for t in tests          t   t strip           if not t  continue         print  -  50         print t         try              print count invert t               for s in invert t                   print s         except ParseFatalException pfe              print pfe msg             print             continue         print  if   name         main         main

User · Answer

No  if you are strictly speaking about regular expressions and not including some regular expression implementations that are actually context free grammars     There is one limitation of regular expressions which makes it impossible to write a regex that matches all and only regexes   You cannot match implementations such as braces which are paired   Regexes use many such constructs  let s take    as an example   Whenever there is an   there must be a matching    which is simple enough for a regex            What makes it impossible for regexes is that they can be nested   How can you write a regex that matches nested brackets   The answer is you can t without an infinitely long regex   You can match any number of nested parenthesis through brute force but you can t ever match an arbitrarily long set of nested brackets     This capability is often referred to as counting  because you re counting the depth of the nesting   A regex by definition does not have the capability to count      I ended up writing  Regular Expression Limitations  about this

User · Answer

start of string                                                 first group start                                                       literals and                                                      escaped characters                                                 character classes                                                            lt         gt      1          parenthesis  with recursive content                R   -   d                        recursive matching                            d      d                    quantifiers                                                 alternative                                                 repeat content                                                 end first group                                                 end of string     This is a recursive regex  and is not supported by many regex engines  PCRE based ones should support it   Without whitespace and comments                                                                                                 lt         gt     1             R   -   d                    d      d                           NET does not support recursion directly   The   1  and   R  constructs   The recursion would have to be converted to counting balanced groups                                               start of string                                                 literals and                                                  escaped characters                                             character classes                                                                   lt                    gt                lt    W d  w  gt                   W d  w                                               opening of group         lt N gt                                     increment counter                                             closing of group         lt -N gt                                    decrement counter                        d      d                  quantifiers                                             alternative                                             repeat content                                             end of string    N                                        fail if counter is non-zero    Compacted                                                                                               lt         gt     lt    W d  w  gt        W d  w       lt N gt        lt -N gt                  d      d                      N         From the comments      Will this validate substitutions and translations    It will validate just the regex part of substitutions and translations  s  lt this part gt           It is not theoretically possible to match all valid regex grammars with a regex     It is possible if the regex engine supports recursion  such as PCRE  but that can t really be called regular expressions any more      Indeed  a  recursive regular expression  is not a regular expression  But this an often-accepted extension to regex engines    Ironically  this extended regex doesn t match extended regexes     In theory  theory and practice are the same  In practice  they re not   Almost everyone who knows regular expressions knows that regular expressions does not support recursion  But PCRE and most other implementations support much more than basic regular expressions      using this with shell script in the grep command   it shows me some error   grep  Invalid content of      I am making a script that could grep a code base to find all the files that contain regular expressions   This pattern exploits an extension called recursive regular expressions  This is not supported by the POSIX flavor of regex  You could try with the -P switch  to enable the PCRE regex flavor      Regex itself  is not a regular language and hence cannot be parsed by regular expression       This is true for classical regular expressions  Some modern implementations allow recursion  which makes it into a Context Free language  although it is somewhat verbose for this task      I see where you re matching         and other special regex characters  Where are you allowing non-special characters  It seems like this will match              but not  abcdefg   That s a valid regex                     will match any single character  not part of any of the other constructs  This includes both literal  a - z   and certain special characters

User · Answer

Though it is perfectly possible to use a recursive regex as MizardX has posted  for this kind of things it is much more useful a parser  Regexes were originally intended to be used with regular languages  being recursive or having balancing groups is just a patch   The language that defines valid regexes is actually a context free grammar  and you should use an appropriate parser for handling it  Here is an example for a university project for parsing simple regexes  without most constructs   It uses JavaCC  And yes  comments are in Spanish  though method names are pretty self-explanatory   SKIP                   r        t        n    TOKEN           lt  DIGITO    0  -  9    gt       lt  MAYUSCULA    A  -  Z    gt       lt  MINUSCULA    a  -  z    gt       lt  LAMBDA   LAMBDA   gt       lt  VACIO   VACIO   gt     IRegularExpression Expression           IRegularExpression r           r Alternation     return r          Matchea disyunciones  ER   ER IRegularExpression Alternation           IRegularExpression r1   null  r2   null           r1 Concatenation         r2 Alternation                     if  r2    null                return r1            else               return createAlternation r1 r2                          Matchea concatenaciones  ER ER IRegularExpression Concatenation           IRegularExpression r1   null  r2   null           r1 Repetition         r2 Repetition     r1   createConcatenation r1 r2              return r1          Matchea repeticiones  ER  IRegularExpression Repetition           IRegularExpression r           r Atom           r   createRepetition r              return r          Matchea regex atomicas   ER   Terminal  Vacio  Lambda IRegularExpression Atom           String t      IRegularExpression r                r Expression        return r           t Terminal     return createTerminal t            lt LAMBDA gt    return createLambda             lt VACIO gt    return createEmpty            Matchea un terminal  digito o minuscula  y devuelve su valor String Terminal           Token t            t  lt DIGITO gt    t  lt MINUSCULA gt      return t image

User · Answer

No  if you use standard regular expressions   The reason is that you cannot satisfy the pumping lemma for regular languages  The pumping lemma states that a string belonging to language  L  is regular if there exists a number  N  such that  after dividing the string into three substrings x  y  z  such that  x  gt  1  amp  amp   xy  lt  N  you can repeat y as many times as you want and the entire string will still belong to L   A consequence of the pumping lemma is that you cannot have regular strings in the form a Nb Mc N  that is  two substrings having the same length separated by another string  In any way you split such strings in x  y and z  you cannot  pump  y without obtaining a string with a different number of  a  and  c   thus leaving the original language  That s the case  for example  with parentheses in regular expressions

User · Answer

Good question    True regular languages can not decide arbitrarily deeply nested well-formed parenthesis  If your alphabet contains     and     the goal is to decide if a string of these has well-formed matching parenthesis  Since this is a necessary requirement for regular expressions the answer is no   However  if you loosen the requirement and add recursion you can probably do it  The reason is that the recursion can act as a stack letting you  count  the current nesting depth by pushing onto this stack   Russ Cox wrote  Regular Expression Matching Can Be Simple And Fast  which is a wonderful treatise on regex engine implementation

User · Answer

start of string                                                 first group start                                                       literals and                                                      escaped characters                                                 character classes                                                            lt         gt      1          parenthesis  with recursive content                R   -   d                        recursive matching                            d      d                    quantifiers                                                 alternative                                                 repeat content                                                 end first group                                                 end of string     This is a recursive regex  and is not supported by many regex engines  PCRE based ones should support it   Without whitespace and comments                                                                                                 lt         gt     1             R   -   d                    d      d                           NET does not support recursion directly   The   1  and   R  constructs   The recursion would have to be converted to counting balanced groups                                               start of string                                                 literals and                                                  escaped characters                                             character classes                                                                   lt                    gt                lt    W d  w  gt                   W d  w                                               opening of group         lt N gt                                     increment counter                                             closing of group         lt -N gt                                    decrement counter                        d      d                  quantifiers                                             alternative                                             repeat content                                             end of string    N                                        fail if counter is non-zero    Compacted                                                                                               lt         gt     lt    W d  w  gt        W d  w       lt N gt        lt -N gt                  d      d                      N         From the comments      Will this validate substitutions and translations    It will validate just the regex part of substitutions and translations  s  lt this part gt           It is not theoretically possible to match all valid regex grammars with a regex     It is possible if the regex engine supports recursion  such as PCRE  but that can t really be called regular expressions any more      Indeed  a  recursive regular expression  is not a regular expression  But this an often-accepted extension to regex engines    Ironically  this extended regex doesn t match extended regexes     In theory  theory and practice are the same  In practice  they re not   Almost everyone who knows regular expressions knows that regular expressions does not support recursion  But PCRE and most other implementations support much more than basic regular expressions      using this with shell script in the grep command   it shows me some error   grep  Invalid content of      I am making a script that could grep a code base to find all the files that contain regular expressions   This pattern exploits an extension called recursive regular expressions  This is not supported by the POSIX flavor of regex  You could try with the -P switch  to enable the PCRE regex flavor      Regex itself  is not a regular language and hence cannot be parsed by regular expression       This is true for classical regular expressions  Some modern implementations allow recursion  which makes it into a Context Free language  although it is somewhat verbose for this task      I see where you re matching         and other special regex characters  Where are you allowing non-special characters  It seems like this will match              but not  abcdefg   That s a valid regex                     will match any single character  not part of any of the other constructs  This includes both literal  a - z   and certain special characters

User · Answer

start of string                                                 first group start                                                       literals and                                                      escaped characters                                                 character classes                                                            lt         gt      1          parenthesis  with recursive content                R   -   d                        recursive matching                            d      d                    quantifiers                                                 alternative                                                 repeat content                                                 end first group                                                 end of string     This is a recursive regex  and is not supported by many regex engines  PCRE based ones should support it   Without whitespace and comments                                                                                                 lt         gt     1             R   -   d                    d      d                           NET does not support recursion directly   The   1  and   R  constructs   The recursion would have to be converted to counting balanced groups                                               start of string                                                 literals and                                                  escaped characters                                             character classes                                                                   lt                    gt                lt    W d  w  gt                   W d  w                                               opening of group         lt N gt                                     increment counter                                             closing of group         lt -N gt                                    decrement counter                        d      d                  quantifiers                                             alternative                                             repeat content                                             end of string    N                                        fail if counter is non-zero    Compacted                                                                                               lt         gt     lt    W d  w  gt        W d  w       lt N gt        lt -N gt                  d      d                      N         From the comments      Will this validate substitutions and translations    It will validate just the regex part of substitutions and translations  s  lt this part gt           It is not theoretically possible to match all valid regex grammars with a regex     It is possible if the regex engine supports recursion  such as PCRE  but that can t really be called regular expressions any more      Indeed  a  recursive regular expression  is not a regular expression  But this an often-accepted extension to regex engines    Ironically  this extended regex doesn t match extended regexes     In theory  theory and practice are the same  In practice  they re not   Almost everyone who knows regular expressions knows that regular expressions does not support recursion  But PCRE and most other implementations support much more than basic regular expressions      using this with shell script in the grep command   it shows me some error   grep  Invalid content of      I am making a script that could grep a code base to find all the files that contain regular expressions   This pattern exploits an extension called recursive regular expressions  This is not supported by the POSIX flavor of regex  You could try with the -P switch  to enable the PCRE regex flavor      Regex itself  is not a regular language and hence cannot be parsed by regular expression       This is true for classical regular expressions  Some modern implementations allow recursion  which makes it into a Context Free language  although it is somewhat verbose for this task      I see where you re matching         and other special regex characters  Where are you allowing non-special characters  It seems like this will match              but not  abcdefg   That s a valid regex                     will match any single character  not part of any of the other constructs  This includes both literal  a - z   and certain special characters

User · Answer

start of string                                                 first group start                                                       literals and                                                      escaped characters                                                 character classes                                                            lt         gt      1          parenthesis  with recursive content                R   -   d                        recursive matching                            d      d                    quantifiers                                                 alternative                                                 repeat content                                                 end first group                                                 end of string     This is a recursive regex  and is not supported by many regex engines  PCRE based ones should support it   Without whitespace and comments                                                                                                 lt         gt     1             R   -   d                    d      d                           NET does not support recursion directly   The   1  and   R  constructs   The recursion would have to be converted to counting balanced groups                                               start of string                                                 literals and                                                  escaped characters                                             character classes                                                                   lt                    gt                lt    W d  w  gt                   W d  w                                               opening of group         lt N gt                                     increment counter                                             closing of group         lt -N gt                                    decrement counter                        d      d                  quantifiers                                             alternative                                             repeat content                                             end of string    N                                        fail if counter is non-zero    Compacted                                                                                               lt         gt     lt    W d  w  gt        W d  w       lt N gt        lt -N gt                  d      d                      N         From the comments      Will this validate substitutions and translations    It will validate just the regex part of substitutions and translations  s  lt this part gt           It is not theoretically possible to match all valid regex grammars with a regex     It is possible if the regex engine supports recursion  such as PCRE  but that can t really be called regular expressions any more      Indeed  a  recursive regular expression  is not a regular expression  But this an often-accepted extension to regex engines    Ironically  this extended regex doesn t match extended regexes     In theory  theory and practice are the same  In practice  they re not   Almost everyone who knows regular expressions knows that regular expressions does not support recursion  But PCRE and most other implementations support much more than basic regular expressions      using this with shell script in the grep command   it shows me some error   grep  Invalid content of      I am making a script that could grep a code base to find all the files that contain regular expressions   This pattern exploits an extension called recursive regular expressions  This is not supported by the POSIX flavor of regex  You could try with the -P switch  to enable the PCRE regex flavor      Regex itself  is not a regular language and hence cannot be parsed by regular expression       This is true for classical regular expressions  Some modern implementations allow recursion  which makes it into a Context Free language  although it is somewhat verbose for this task      I see where you re matching         and other special regex characters  Where are you allowing non-special characters  It seems like this will match              but not  abcdefg   That s a valid regex                     will match any single character  not part of any of the other constructs  This includes both literal  a - z   and certain special characters

User · Answer

Unlikely   Evaluate it in a try  catch or whatever your language provides

User · Answer

Though it is perfectly possible to use a recursive regex as MizardX has posted  for this kind of things it is much more useful a parser  Regexes were originally intended to be used with regular languages  being recursive or having balancing groups is just a patch   The language that defines valid regexes is actually a context free grammar  and you should use an appropriate parser for handling it  Here is an example for a university project for parsing simple regexes  without most constructs   It uses JavaCC  And yes  comments are in Spanish  though method names are pretty self-explanatory   SKIP                   r        t        n    TOKEN           lt  DIGITO    0  -  9    gt       lt  MAYUSCULA    A  -  Z    gt       lt  MINUSCULA    a  -  z    gt       lt  LAMBDA   LAMBDA   gt       lt  VACIO   VACIO   gt     IRegularExpression Expression           IRegularExpression r           r Alternation     return r          Matchea disyunciones  ER   ER IRegularExpression Alternation           IRegularExpression r1   null  r2   null           r1 Concatenation         r2 Alternation                     if  r2    null                return r1            else               return createAlternation r1 r2                          Matchea concatenaciones  ER ER IRegularExpression Concatenation           IRegularExpression r1   null  r2   null           r1 Repetition         r2 Repetition     r1   createConcatenation r1 r2              return r1          Matchea repeticiones  ER  IRegularExpression Repetition           IRegularExpression r           r Atom           r   createRepetition r              return r          Matchea regex atomicas   ER   Terminal  Vacio  Lambda IRegularExpression Atom           String t      IRegularExpression r                r Expression        return r           t Terminal     return createTerminal t            lt LAMBDA gt    return createLambda             lt VACIO gt    return createEmpty            Matchea un terminal  digito o minuscula  y devuelve su valor String Terminal           Token t            t  lt DIGITO gt    t  lt MINUSCULA gt      return t image

User · Answer

Good question    True regular languages can not decide arbitrarily deeply nested well-formed parenthesis  If your alphabet contains     and     the goal is to decide if a string of these has well-formed matching parenthesis  Since this is a necessary requirement for regular expressions the answer is no   However  if you loosen the requirement and add recursion you can probably do it  The reason is that the recursion can act as a stack letting you  count  the current nesting depth by pushing onto this stack   Russ Cox wrote  Regular Expression Matching Can Be Simple And Fast  which is a wonderful treatise on regex engine implementation

User · Answer

Unlikely   Evaluate it in a try  catch or whatever your language provides

User · Answer

No  if you are strictly speaking about regular expressions and not including some regular expression implementations that are actually context free grammars     There is one limitation of regular expressions which makes it impossible to write a regex that matches all and only regexes   You cannot match implementations such as braces which are paired   Regexes use many such constructs  let s take    as an example   Whenever there is an   there must be a matching    which is simple enough for a regex            What makes it impossible for regexes is that they can be nested   How can you write a regex that matches nested brackets   The answer is you can t without an infinitely long regex   You can match any number of nested parenthesis through brute force but you can t ever match an arbitrarily long set of nested brackets     This capability is often referred to as counting  because you re counting the depth of the nesting   A regex by definition does not have the capability to count      I ended up writing  Regular Expression Limitations  about this

User · Answer

No  if you are strictly speaking about regular expressions and not including some regular expression implementations that are actually context free grammars     There is one limitation of regular expressions which makes it impossible to write a regex that matches all and only regexes   You cannot match implementations such as braces which are paired   Regexes use many such constructs  let s take    as an example   Whenever there is an   there must be a matching    which is simple enough for a regex            What makes it impossible for regexes is that they can be nested   How can you write a regex that matches nested brackets   The answer is you can t without an infinitely long regex   You can match any number of nested parenthesis through brute force but you can t ever match an arbitrarily long set of nested brackets     This capability is often referred to as counting  because you re counting the depth of the nesting   A regex by definition does not have the capability to count      I ended up writing  Regular Expression Limitations  about this

User · Answer

Though it is perfectly possible to use a recursive regex as MizardX has posted  for this kind of things it is much more useful a parser  Regexes were originally intended to be used with regular languages  being recursive or having balancing groups is just a patch   The language that defines valid regexes is actually a context free grammar  and you should use an appropriate parser for handling it  Here is an example for a university project for parsing simple regexes  without most constructs   It uses JavaCC  And yes  comments are in Spanish  though method names are pretty self-explanatory   SKIP                   r        t        n    TOKEN           lt  DIGITO    0  -  9    gt       lt  MAYUSCULA    A  -  Z    gt       lt  MINUSCULA    a  -  z    gt       lt  LAMBDA   LAMBDA   gt       lt  VACIO   VACIO   gt     IRegularExpression Expression           IRegularExpression r           r Alternation     return r          Matchea disyunciones  ER   ER IRegularExpression Alternation           IRegularExpression r1   null  r2   null           r1 Concatenation         r2 Alternation                     if  r2    null                return r1            else               return createAlternation r1 r2                          Matchea concatenaciones  ER ER IRegularExpression Concatenation           IRegularExpression r1   null  r2   null           r1 Repetition         r2 Repetition     r1   createConcatenation r1 r2              return r1          Matchea repeticiones  ER  IRegularExpression Repetition           IRegularExpression r           r Atom           r   createRepetition r              return r          Matchea regex atomicas   ER   Terminal  Vacio  Lambda IRegularExpression Atom           String t      IRegularExpression r                r Expression        return r           t Terminal     return createTerminal t            lt LAMBDA gt    return createLambda             lt VACIO gt    return createEmpty            Matchea un terminal  digito o minuscula  y devuelve su valor String Terminal           Token t            t  lt DIGITO gt    t  lt MINUSCULA gt      return t image

User · Answer

Unlikely   Evaluate it in a try  catch or whatever your language provides

User · Answer

You can submit the regex to preg match which will return false if the regex is not valid   Don t forget to use the   to suppress error messages    preg match  regexToTest          Will return 1 if the regex is      Will return 0 if the regex is okay    Will return false otherwise

User · Answer

No  if you are strictly speaking about regular expressions and not including some regular expression implementations that are actually context free grammars     There is one limitation of regular expressions which makes it impossible to write a regex that matches all and only regexes   You cannot match implementations such as braces which are paired   Regexes use many such constructs  let s take    as an example   Whenever there is an   there must be a matching    which is simple enough for a regex            What makes it impossible for regexes is that they can be nested   How can you write a regex that matches nested brackets   The answer is you can t without an infinitely long regex   You can match any number of nested parenthesis through brute force but you can t ever match an arbitrarily long set of nested brackets     This capability is often referred to as counting  because you re counting the depth of the nesting   A regex by definition does not have the capability to count      I ended up writing  Regular Expression Limitations  about this

User · Answer

The following example by Paul McGuire  originally from the pyparsing wiki  but now available only through the Wayback Machine  gives a grammar for parsing some regexes  for the purposes of returning the set of matching strings   As such  it rejects those re s that include unbounded repetition terms  like     and       But it should give you an idea about how to structure a parser that would process re s        invRegex py     Copyright 2008  Paul McGuire     pyparsing script to expand a regular expression into all possible matching strings   Supports    -  n  and  m n  repetition  but not unbounded   or   repetition   -   optional elements   -    character ranges   -    grouping   -   alternation     all       count   invert    from pyparsing import  Literal  oneOf  printables  ParserElement  Combine       SkipTo  operatorPrecedence  ParseFatalException  Word  nums  opAssoc      Suppress  ParseResults  srange   class CharacterRangeEmitter object       def   init   self chars             remove duplicate chars in character range  but preserve original order         seen   set           self charset      join  seen add c  or c for c in chars if c not in seen       def   str   self           return     self charset         def   repr   self           return     self charset         def makeGenerator self           def genChars                for s in self charset                  yield s         return genChars  class OptionalEmitter object       def   init   self expr           self expr   expr     def makeGenerator self           def optionalGen                yield                for s in self expr makeGenerator                      yield s         return optionalGen  class DotEmitter object       def makeGenerator self           def dotGen                for c in printables                  yield c         return dotGen  class GroupEmitter object       def   init   self exprs           self exprs   ParseResults exprs      def makeGenerator self           def groupGen                def recurseList elist                   if len elist   1                      for s in elist 0  makeGenerator                              yield s                 else                      for s in elist 0  makeGenerator                              for s2 in recurseList elist 1                                 yield s   s2             if self exprs                  for s in recurseList self exprs                       yield s         return groupGen  class AlternativeEmitter object       def   init   self exprs           self exprs   exprs     def makeGenerator self           def altGen                for e in self exprs                  for s in e makeGenerator                          yield s         return altGen  class LiteralEmitter object       def   init   self lit           self lit   lit     def   str   self           return  Lit   self lit     def   repr   self           return  Lit   self lit     def makeGenerator self           def litGen                yield self lit         return litGen  def handleRange toks       return CharacterRangeEmitter srange toks 0     def handleRepetition toks       toks toks 0      if toks 1  in               raise ParseFatalException    0  unbounded repetition operators not supported       if toks 1                  return OptionalEmitter toks 0       if  count  in toks          return GroupEmitter  toks 0     int toks count       if  minCount  in toks          mincount   int toks minCount          maxcount   int toks maxCount          optcount   maxcount - mincount         if optcount              opt   OptionalEmitter toks 0               for i in range 1 optcount                   opt   OptionalEmitter GroupEmitter  toks 0  opt                return GroupEmitter  toks 0     mincount    opt           else              return  toks 0     mincount  def handleLiteral toks       lit          for t in toks          if t 0                       if t 1      t                   lit      t              else                  lit    t 1          else              lit    t     return LiteralEmitter lit       def handleMacro toks       macroChar   toks 0  1      if macroChar     d           return CharacterRangeEmitter  0123456789       elif macroChar     w           return CharacterRangeEmitter srange   A-Za-z0-9          elif macroChar     s           return LiteralEmitter          else          raise ParseFatalException    0  unsupported macro character      macroChar         def handleSequence toks       return GroupEmitter toks 0    def handleDot        return CharacterRangeEmitter printables   def handleAlternative toks       return AlternativeEmitter toks 0      parser   None def parser        global  parser     if  parser is None          ParserElement setDefaultWhitespaceChars             lbrack rbrack lbrace rbrace lparen rparen   map Literal                    reMacro   Combine        oneOf list  dws             escapedChar    reMacro   Combine        oneOf list printables            reLiteralChar      join c for c in printables if c not in r                     t           reRange   Combine lbrack   SkipTo rbrack ignore escapedChar    rbrack          reLiteral     escapedChar   oneOf list reLiteralChar             reDot   Literal              repetition                   lbrace   Word nums  setResultsName  count     rbrace                   lbrace   Word nums  setResultsName  minCount        Word nums  setResultsName  maxCount     rbrace                 oneOf list                                 reRange setParseAction handleRange          reLiteral setParseAction handleLiteral          reMacro setParseAction handleMacro          reDot setParseAction handleDot           reTerm     reLiteral   reRange   reMacro   reDot           reExpr   operatorPrecedence  reTerm                             repetition  1  opAssoc LEFT  handleRepetition                None  2  opAssoc LEFT  handleSequence                Suppress       2  opAssoc LEFT  handleAlternative                                        parser   reExpr      return  parser  def count gen          Simple function to count the number of elements returned by a generator         i   0     for s in gen          i    1     return i  def invert regex          Call this routine as a generator to return all the strings that        match the input regular expression             for s in invert   A-Z  3  d 3                    print s             invReGenerator   GroupEmitter parser   parseString regex   makeGenerator       return invReGenerator    def main        tests   r         A-EA       A-D        A-D  3      X A-C  3 Y     X A-C  3        X d     foobar d d     foobar 2      foobar 2 9      fooba rz  2       foobar  2        01  d   2 0-5         01  d d   2 0-4  d   25 0-5        A-C  1 2       A-C  0 3       A-C  s A-C  s A-C       A-C  s  A-C  A-C       A-C  s  A-C  A-C        A-C  s  A-C  A-C         A-C  2  d 2        TH 12          TH 12            TH 12  AL 12  SP 123  TB 1 0-9   20   3-9             TH 12  AL 12  SP 123  TB 1 0-9   20   3-9   OH 1 0-9   2 0-9   30   4-9            ECMP  HA AK  SD  HS T      A-CV  2      A cglmrstu  B aehikr   C adeflmorsu   D bsy  E rsu  F emr   G ade  H efgos   I nr   Kr  L airu  M dgnot  N abdeiop   Os  P abdmortu   R abefghnu  S bcegimnr   T abcehilm  Uu bhopqst  U V W Xe Yb  Z nr       a b   x y       a b   x y          split   n        for t in tests          t   t strip           if not t  continue         print  -  50         print t         try              print count invert t               for s in invert t                   print s         except ParseFatalException pfe              print pfe msg             print             continue         print  if   name         main         main

User · Answer

Good question    True regular languages can not decide arbitrarily deeply nested well-formed parenthesis  If your alphabet contains     and     the goal is to decide if a string of these has well-formed matching parenthesis  Since this is a necessary requirement for regular expressions the answer is no   However  if you loosen the requirement and add recursion you can probably do it  The reason is that the recursion can act as a stack letting you  count  the current nesting depth by pushing onto this stack   Russ Cox wrote  Regular Expression Matching Can Be Simple And Fast  which is a wonderful treatise on regex engine implementation

User · Answer

Though it is perfectly possible to use a recursive regex as MizardX has posted  for this kind of things it is much more useful a parser  Regexes were originally intended to be used with regular languages  being recursive or having balancing groups is just a patch   The language that defines valid regexes is actually a context free grammar  and you should use an appropriate parser for handling it  Here is an example for a university project for parsing simple regexes  without most constructs   It uses JavaCC  And yes  comments are in Spanish  though method names are pretty self-explanatory   SKIP                   r        t        n    TOKEN           lt  DIGITO    0  -  9    gt       lt  MAYUSCULA    A  -  Z    gt       lt  MINUSCULA    a  -  z    gt       lt  LAMBDA   LAMBDA   gt       lt  VACIO   VACIO   gt     IRegularExpression Expression           IRegularExpression r           r Alternation     return r          Matchea disyunciones  ER   ER IRegularExpression Alternation           IRegularExpression r1   null  r2   null           r1 Concatenation         r2 Alternation                     if  r2    null                return r1            else               return createAlternation r1 r2                          Matchea concatenaciones  ER ER IRegularExpression Concatenation           IRegularExpression r1   null  r2   null           r1 Repetition         r2 Repetition     r1   createConcatenation r1 r2              return r1          Matchea repeticiones  ER  IRegularExpression Repetition           IRegularExpression r           r Atom           r   createRepetition r              return r          Matchea regex atomicas   ER   Terminal  Vacio  Lambda IRegularExpression Atom           String t      IRegularExpression r                r Expression        return r           t Terminal     return createTerminal t            lt LAMBDA gt    return createLambda             lt VACIO gt    return createEmpty            Matchea un terminal  digito o minuscula  y devuelve su valor String Terminal           Token t            t  lt DIGITO gt    t  lt MINUSCULA gt      return t image

User · Answer

You can submit the regex to preg match which will return false if the regex is not valid   Don t forget to use the   to suppress error messages    preg match  regexToTest          Will return 1 if the regex is      Will return 0 if the regex is okay    Will return false otherwise

[regex] Is there a regular expression to detect a valid regular expression?

Examples related to regex