What special characters must be escaped in regular expressions

Question

I am tired of always trying to guess  if I should escape special characters like           etc  when using many implementations of regexps   It is different with  for example  Python  sed  grep  awk  Perl  rename  Apache  find and so on  Is there any rule set which tells when I should  and when I should not  escape special characters  Does it depend on the regexp type  like PCRE  POSIX or extended regexps

User · Answer

For PHP   it is always safe to precede a non-alphanumeric with     to specify that it stands for itself   - http   php net manual en regexp reference escape php   Except if it s a   or         To escape regex pattern variables  or partial variables  in PHP use preg quote

User · Answer

To know when and what to escape without attempts is necessary to understand precisely the chain of contexts the string pass through  You will specify the string from the farthest side to its final destination which is the memory handled by the regexp parsing code   Be aware how the string in memory is processed  if can be a plain string inside the code  or a string entered to the command line  but a could be either an interactive command line or a command line stated inside a shell script file  or inside a variable in memory mentioned by the code  or an  string argument through further evaluation  or a string containing code generated dynamically with any sort of encapsulation     Each of this context assigned some characters with special functionality   When you want to pass the character literally without using its special function  local to the context   than that s the case you have to escape it  for the next context    which might need some other escape characters which might additionally need to be escaped in the preceding context s   Furthermore there can be things like character encoding  the most insidious is utf-8 because it look like ASCII for common characters  but might be optionally interpreted even by the terminal depending on its settings so it might behave differently  then the encoding attribute of HTML XML  it s necessary to understand the process precisely right   E g  A regexp in the command line starting with perl -npe  needs to be transferred to a set of exec system calls connecting as pipe the file handles  each of this exec system calls just has a list of arguments that were separated by  non escaped spaces  and possibly pipes    and redirection    N  N  amp M   parenthesis  interactive expansion of   and               all this are special characters used by the  sh which might appear to interfere with the character of the regular expression in the next context  but they are evaluated in order  before the command line  The command line is read by a program as bash sh csh tcsh zsh  essentially inside double quote or single quote the escape is simpler but it is not necessary to quote a string in the command line because mostly the space has to be prefixed with backslash and the quote are not necessary leaving available the expand functionality for characters   and    but this parse as different context as within quote  Then when the command line is evaluated the regexp obtained in memory  not as written in the command line  receives the same treatment as it would be in a source file  For regexp there is character-set context within square brackets      perl regular expression can be quoted by a large set of non alfa-numeric characters  E g  m   or m  better for path         You have more details about characters in other answer  which are very specific to the final regexp context  As I noted you mention that you find the regexp escape with attempts  that s probably because different context has different set of character that confused your memory of attempts  often backslash is the character used in those different context to escape a literal character instead of its function

User · Answer

Really  there isn t   there are about a half-zillion different regex syntaxes  they seem to come down to Perl  EMACS GNU  and AT amp T in general  but I m always getting surprised too

User · Answer

Which characters you must and which you mustn t escape indeed depends on the regex flavor you re working with   For PCRE  and most other so-called Perl-compatible flavors  escape these outside character classes                  and these inside character classes    -     For POSIX extended regexes  ERE   escape these outside character classes  same as PCRE                   Escaping any other characters is an error with POSIX ERE   Inside character classes  the backslash is a literal character in POSIX regular expressions   You cannot use it to escape anything   You have to use  clever placement  if you want to include character class metacharacters as literals   Put the   anywhere except at the start  the   at the start  and the - at the start or the end of the character class to match these literally  e g       -    In POSIX basic regular expressions  BRE   these are metacharacters that you need to escape to suppress their meaning            Escaping parentheses and curly brackets in BREs gives them the special meaning their unescaped versions have in EREs   Some implementations  e g  GNU  also give special meaning to other characters when escaped  such as    and     Escaping a character other than          is normally an error with BREs   Inside character classes  BREs follow the same rule as EREs   If all this makes your head spin  grab a copy of RegexBuddy   On the Create tab  click Insert Token  and then Literal   RegexBuddy will add escapes as needed

User · Answer

For PHP   it is always safe to precede a non-alphanumeric with     to specify that it stands for itself   - http   php net manual en regexp reference escape php   Except if it s a   or         To escape regex pattern variables  or partial variables  in PHP use preg quote

User · Answer

Unfortunately  the meaning of things like   and    are swapped between Emacs style regular expressions and most other styles   So if you try to escape these you may be doing the opposite of what you want   So you really have to know what style you are trying to quote

User · Answer

To know when and what to escape without attempts is necessary to understand precisely the chain of contexts the string pass through  You will specify the string from the farthest side to its final destination which is the memory handled by the regexp parsing code   Be aware how the string in memory is processed  if can be a plain string inside the code  or a string entered to the command line  but a could be either an interactive command line or a command line stated inside a shell script file  or inside a variable in memory mentioned by the code  or an  string argument through further evaluation  or a string containing code generated dynamically with any sort of encapsulation     Each of this context assigned some characters with special functionality   When you want to pass the character literally without using its special function  local to the context   than that s the case you have to escape it  for the next context    which might need some other escape characters which might additionally need to be escaped in the preceding context s   Furthermore there can be things like character encoding  the most insidious is utf-8 because it look like ASCII for common characters  but might be optionally interpreted even by the terminal depending on its settings so it might behave differently  then the encoding attribute of HTML XML  it s necessary to understand the process precisely right   E g  A regexp in the command line starting with perl -npe  needs to be transferred to a set of exec system calls connecting as pipe the file handles  each of this exec system calls just has a list of arguments that were separated by  non escaped spaces  and possibly pipes    and redirection    N  N  amp M   parenthesis  interactive expansion of   and               all this are special characters used by the  sh which might appear to interfere with the character of the regular expression in the next context  but they are evaluated in order  before the command line  The command line is read by a program as bash sh csh tcsh zsh  essentially inside double quote or single quote the escape is simpler but it is not necessary to quote a string in the command line because mostly the space has to be prefixed with backslash and the quote are not necessary leaving available the expand functionality for characters   and    but this parse as different context as within quote  Then when the command line is evaluated the regexp obtained in memory  not as written in the command line  receives the same treatment as it would be in a source file  For regexp there is character-set context within square brackets      perl regular expression can be quoted by a large set of non alfa-numeric characters  E g  m   or m  better for path         You have more details about characters in other answer  which are very specific to the final regexp context  As I noted you mention that you find the regexp escape with attempts  that s probably because different context has different set of character that confused your memory of attempts  often backslash is the character used in those different context to escape a literal character instead of its function

User · Answer

Sometimes simple escaping is not possible with the characters you ve listed  For example  using a backslash to escape a bracket isn t going to work in the left hand side of a substitution string in sed  namely  sed -e  s foo  bar something else     I tend to just use a simple character class definition instead  so the above expression becomes  sed -e  s foo   bar something else     which I find works for most regexp implementations   BTW Character classes are pretty vanilla regexp components so they tend to work in most situations where you need escaped characters in regexps   Edit  After the comment below  just thought I d mention the fact that you also have to consider the difference between finite state automata and non-finite state automata when looking at the behaviour of regexp evaluation   You might like to look at  the shiny ball book  aka Effective Perl  sanitised Amazon link   specifically the chapter on regular expressions  to get a feel for then difference in regexp engine evaluation types   Not all the world s a PCRE   Anyway  regexp s are so clunky compared to SNOBOL   Now that was an interesting programming course  Along with the one on Simula   Ah the joys of studying at UNSW in the late  70 s   -

User · Answer

POSIX recognizes multiple variations on regular expressions - basic regular expressions  BRE  and extended regular expressions  ERE    And even then  there are quirks because of the historical implementations of the utilities standardized by POSIX   There isn t a simple rule for when to use which notation  or even which notation a given command uses   Check out Jeff Friedl s Mastering Regular Expressions book

User · Answer

POSIX recognizes multiple variations on regular expressions - basic regular expressions  BRE  and extended regular expressions  ERE    And even then  there are quirks because of the historical implementations of the utilities standardized by POSIX   There isn t a simple rule for when to use which notation  or even which notation a given command uses   Check out Jeff Friedl s Mastering Regular Expressions book

User · Answer

Unfortunately  the meaning of things like   and    are swapped between Emacs style regular expressions and most other styles   So if you try to escape these you may be doing the opposite of what you want   So you really have to know what style you are trying to quote

User · Answer

Sometimes simple escaping is not possible with the characters you ve listed  For example  using a backslash to escape a bracket isn t going to work in the left hand side of a substitution string in sed  namely  sed -e  s foo  bar something else     I tend to just use a simple character class definition instead  so the above expression becomes  sed -e  s foo   bar something else     which I find works for most regexp implementations   BTW Character classes are pretty vanilla regexp components so they tend to work in most situations where you need escaped characters in regexps   Edit  After the comment below  just thought I d mention the fact that you also have to consider the difference between finite state automata and non-finite state automata when looking at the behaviour of regexp evaluation   You might like to look at  the shiny ball book  aka Effective Perl  sanitised Amazon link   specifically the chapter on regular expressions  to get a feel for then difference in regexp engine evaluation types   Not all the world s a PCRE   Anyway  regexp s are so clunky compared to SNOBOL   Now that was an interesting programming course  Along with the one on Simula   Ah the joys of studying at UNSW in the late  70 s   -

User · Answer

Sometimes simple escaping is not possible with the characters you ve listed  For example  using a backslash to escape a bracket isn t going to work in the left hand side of a substitution string in sed  namely  sed -e  s foo  bar something else     I tend to just use a simple character class definition instead  so the above expression becomes  sed -e  s foo   bar something else     which I find works for most regexp implementations   BTW Character classes are pretty vanilla regexp components so they tend to work in most situations where you need escaped characters in regexps   Edit  After the comment below  just thought I d mention the fact that you also have to consider the difference between finite state automata and non-finite state automata when looking at the behaviour of regexp evaluation   You might like to look at  the shiny ball book  aka Effective Perl  sanitised Amazon link   specifically the chapter on regular expressions  to get a feel for then difference in regexp engine evaluation types   Not all the world s a PCRE   Anyway  regexp s are so clunky compared to SNOBOL   Now that was an interesting programming course  Along with the one on Simula   Ah the joys of studying at UNSW in the late  70 s   -

User · Answer

Modern RegEx Flavors  PCRE  Includes C  C    Delphi  EditPad  Java  JavaScript  Perl  PHP  preg   PostgreSQL  PowerGREP  PowerShell  Python  REALbasic  Real Studio  Ruby  TCL  VB Net  VBScript  wxWidgets  XML Schema  Xojo  XRegExp PCRE compatibility may vary         Anywhere            -                    Legacy RegEx Flavors  BRE ERE  Includes awk  ed  egrep  emacs  GNUlib  grep  PHP  ereg   MySQL  Oracle  R  sed PCRE support may be enabled in later versions or by using extensions ERE awk egrep emacs         Outside a character class                                    Inside a character class    -     BRE ed grep sed         Outside a character class                      Inside a character class    -             For literals  don t escape                        For standard regex behavior  escape                        Notes  If unsure about a specific character  it can be escaped like  xFF Alphanumeric characters cannot be escaped with a backslash Arbitrary symbols can be escaped with a backslash in PCRE  but not BRE ERE  they must only be escaped when required   For PCRE   - only need escaping within a character class  but I kept them in a single list for simplicity Quoted expression strings must also have the surrounding quote characters escaped  and often with backslashes doubled-up  like  quot    quot           quot  versus    quot            in JavaScript  Aside from escapes  different regex implementations may support different modifiers  character classes  anchors  quantifiers  and other features  For more details  check out regular-expressions info  or use regex101 com to test your expressions live

User · Answer

Unfortunately there really isn t a set set of escape codes since it varies based on the language you are using   However  keeping a page like the Regular Expression Tools Page or this Regular Expression Cheatsheet can go a long way to help you quickly filter things out

User · Answer

Maybe an old thread  but this code might be useful to visitors who want to create without regex def listToString s                initialize an empty string      str1    quot  quot              return string        return  str1 join s     r    quot Hello  How are you   Smiling Face   Heart  erwer quot  r1   list r  i   0 r2   list   start   True  for string in r1      if string     quot   quot           if start               start   False         else              start   True     else          if start               r2 append string          else              print  quot skipped quot    string                 print listToString r2

User · Answer

Unfortunately  the meaning of things like   and    are swapped between Emacs style regular expressions and most other styles   So if you try to escape these you may be doing the opposite of what you want   So you really have to know what style you are trying to quote

User · Answer

Unfortunately  the meaning of things like   and    are swapped between Emacs style regular expressions and most other styles   So if you try to escape these you may be doing the opposite of what you want   So you really have to know what style you are trying to quote

User · Answer

POSIX recognizes multiple variations on regular expressions - basic regular expressions  BRE  and extended regular expressions  ERE    And even then  there are quirks because of the historical implementations of the utilities standardized by POSIX   There isn t a simple rule for when to use which notation  or even which notation a given command uses   Check out Jeff Friedl s Mastering Regular Expressions book

User · Answer

Which characters you must and which you mustn t escape indeed depends on the regex flavor you re working with   For PCRE  and most other so-called Perl-compatible flavors  escape these outside character classes                  and these inside character classes    -     For POSIX extended regexes  ERE   escape these outside character classes  same as PCRE                   Escaping any other characters is an error with POSIX ERE   Inside character classes  the backslash is a literal character in POSIX regular expressions   You cannot use it to escape anything   You have to use  clever placement  if you want to include character class metacharacters as literals   Put the   anywhere except at the start  the   at the start  and the - at the start or the end of the character class to match these literally  e g       -    In POSIX basic regular expressions  BRE   these are metacharacters that you need to escape to suppress their meaning            Escaping parentheses and curly brackets in BREs gives them the special meaning their unescaped versions have in EREs   Some implementations  e g  GNU  also give special meaning to other characters when escaped  such as    and     Escaping a character other than          is normally an error with BREs   Inside character classes  BREs follow the same rule as EREs   If all this makes your head spin  grab a copy of RegexBuddy   On the Create tab  click Insert Token  and then Literal   RegexBuddy will add escapes as needed

User · Answer

Maybe an old thread  but this code might be useful to visitors who want to create without regex def listToString s                initialize an empty string      str1    quot  quot              return string        return  str1 join s     r    quot Hello  How are you   Smiling Face   Heart  erwer quot  r1   list r  i   0 r2   list   start   True  for string in r1      if string     quot   quot           if start               start   False         else              start   True     else          if start               r2 append string          else              print  quot skipped quot    string                 print listToString r2

User · Answer

https   perldoc perl org perlre html Quoting-metacharacters and https   perldoc perl org functions quotemeta html  In the official documentation  such characters are called metacharacters  Example of quoting   my  regex   quotemeta  string  s  regex something

User · Answer

Really  there isn t   there are about a half-zillion different regex syntaxes  they seem to come down to Perl  EMACS GNU  and AT amp T in general  but I m always getting surprised too

User · Answer

Unfortunately there really isn t a set set of escape codes since it varies based on the language you are using   However  keeping a page like the Regular Expression Tools Page or this Regular Expression Cheatsheet can go a long way to help you quickly filter things out

User · Answer

Unfortunately there really isn t a set set of escape codes since it varies based on the language you are using   However  keeping a page like the Regular Expression Tools Page or this Regular Expression Cheatsheet can go a long way to help you quickly filter things out

User · Answer

Unfortunately there really isn t a set set of escape codes since it varies based on the language you are using   However  keeping a page like the Regular Expression Tools Page or this Regular Expression Cheatsheet can go a long way to help you quickly filter things out

User · Answer

POSIX recognizes multiple variations on regular expressions - basic regular expressions  BRE  and extended regular expressions  ERE    And even then  there are quirks because of the historical implementations of the utilities standardized by POSIX   There isn t a simple rule for when to use which notation  or even which notation a given command uses   Check out Jeff Friedl s Mastering Regular Expressions book

User · Answer

Sometimes simple escaping is not possible with the characters you ve listed  For example  using a backslash to escape a bracket isn t going to work in the left hand side of a substitution string in sed  namely  sed -e  s foo  bar something else     I tend to just use a simple character class definition instead  so the above expression becomes  sed -e  s foo   bar something else     which I find works for most regexp implementations   BTW Character classes are pretty vanilla regexp components so they tend to work in most situations where you need escaped characters in regexps   Edit  After the comment below  just thought I d mention the fact that you also have to consider the difference between finite state automata and non-finite state automata when looking at the behaviour of regexp evaluation   You might like to look at  the shiny ball book  aka Effective Perl  sanitised Amazon link   specifically the chapter on regular expressions  to get a feel for then difference in regexp engine evaluation types   Not all the world s a PCRE   Anyway  regexp s are so clunky compared to SNOBOL   Now that was an interesting programming course  Along with the one on Simula   Ah the joys of studying at UNSW in the late  70 s   -

User · Answer

Really  there isn t   there are about a half-zillion different regex syntaxes  they seem to come down to Perl  EMACS GNU  and AT amp T in general  but I m always getting surprised too

User · Answer

https   perldoc perl org perlre html Quoting-metacharacters and https   perldoc perl org functions quotemeta html  In the official documentation  such characters are called metacharacters  Example of quoting   my  regex   quotemeta  string  s  regex something

User · Answer

Modern RegEx Flavors  PCRE  Includes C  C    Delphi  EditPad  Java  JavaScript  Perl  PHP  preg   PostgreSQL  PowerGREP  PowerShell  Python  REALbasic  Real Studio  Ruby  TCL  VB Net  VBScript  wxWidgets  XML Schema  Xojo  XRegExp PCRE compatibility may vary         Anywhere            -                    Legacy RegEx Flavors  BRE ERE  Includes awk  ed  egrep  emacs  GNUlib  grep  PHP  ereg   MySQL  Oracle  R  sed PCRE support may be enabled in later versions or by using extensions ERE awk egrep emacs         Outside a character class                                    Inside a character class    -     BRE ed grep sed         Outside a character class                      Inside a character class    -             For literals  don t escape                        For standard regex behavior  escape                        Notes  If unsure about a specific character  it can be escaped like  xFF Alphanumeric characters cannot be escaped with a backslash Arbitrary symbols can be escaped with a backslash in PCRE  but not BRE ERE  they must only be escaped when required   For PCRE   - only need escaping within a character class  but I kept them in a single list for simplicity Quoted expression strings must also have the surrounding quote characters escaped  and often with backslashes doubled-up  like  quot    quot           quot  versus    quot            in JavaScript  Aside from escapes  different regex implementations may support different modifiers  character classes  anchors  quantifiers  and other features  For more details  check out regular-expressions info  or use regex101 com to test your expressions live

User · Answer

Really  there isn t   there are about a half-zillion different regex syntaxes  they seem to come down to Perl  EMACS GNU  and AT amp T in general  but I m always getting surprised too

User · Answer

Which characters you must and which you mustn t escape indeed depends on the regex flavor you re working with   For PCRE  and most other so-called Perl-compatible flavors  escape these outside character classes                  and these inside character classes    -     For POSIX extended regexes  ERE   escape these outside character classes  same as PCRE                   Escaping any other characters is an error with POSIX ERE   Inside character classes  the backslash is a literal character in POSIX regular expressions   You cannot use it to escape anything   You have to use  clever placement  if you want to include character class metacharacters as literals   Put the   anywhere except at the start  the   at the start  and the - at the start or the end of the character class to match these literally  e g       -    In POSIX basic regular expressions  BRE   these are metacharacters that you need to escape to suppress their meaning            Escaping parentheses and curly brackets in BREs gives them the special meaning their unescaped versions have in EREs   Some implementations  e g  GNU  also give special meaning to other characters when escaped  such as    and     Escaping a character other than          is normally an error with BREs   Inside character classes  BREs follow the same rule as EREs   If all this makes your head spin  grab a copy of RegexBuddy   On the Create tab  click Insert Token  and then Literal   RegexBuddy will add escapes as needed

User · Answer

For Ionic  Typescript  you have to double slash in order to scape the characters  For example  this is to match some special characters                              -           amp                 lt  gt            Pay attention to this     -       characters  They have to be double slashed  If you don t do that  you are going to have a type error in your code

User · Answer

Which characters you must and which you mustn t escape indeed depends on the regex flavor you re working with   For PCRE  and most other so-called Perl-compatible flavors  escape these outside character classes                  and these inside character classes    -     For POSIX extended regexes  ERE   escape these outside character classes  same as PCRE                   Escaping any other characters is an error with POSIX ERE   Inside character classes  the backslash is a literal character in POSIX regular expressions   You cannot use it to escape anything   You have to use  clever placement  if you want to include character class metacharacters as literals   Put the   anywhere except at the start  the   at the start  and the - at the start or the end of the character class to match these literally  e g       -    In POSIX basic regular expressions  BRE   these are metacharacters that you need to escape to suppress their meaning            Escaping parentheses and curly brackets in BREs gives them the special meaning their unescaped versions have in EREs   Some implementations  e g  GNU  also give special meaning to other characters when escaped  such as    and     Escaping a character other than          is normally an error with BREs   Inside character classes  BREs follow the same rule as EREs   If all this makes your head spin  grab a copy of RegexBuddy   On the Create tab  click Insert Token  and then Literal   RegexBuddy will add escapes as needed

User · Answer

For Ionic  Typescript  you have to double slash in order to scape the characters  For example  this is to match some special characters                              -           amp                 lt  gt            Pay attention to this     -       characters  They have to be double slashed  If you don t do that  you are going to have a type error in your code

[regex] What special characters must be escaped in regular expressions?

Examples related to regex