Is it worth using Python s re compile

Question

Is there any benefit in using compile for regular expressions in Python   h   re compile  hello   h match  hello world     vs  re match  hello    hello world

User · Answer

FWIW     python -m timeit -s  import re   re match  hello    hello world    100000 loops  best of 3  3 82 usec per loop    python -m timeit -s  import re  h re compile  hello     h match  hello world    1000000 loops  best of 3  1 26 usec per loop   so  if you re going to be using  the same regex a lot  it may be worth it to do re compile  especially for more complex regexes    The standard arguments against premature optimization apply  but I don t think you really lose much clarity straightforwardness by using re compile if you suspect that your regexps may become a performance bottleneck   Update   Under Python 3 6  I suspect the above timings were done using Python 2 x  and 2018 hardware  MacBook Pro   I now get the following timings     python -m timeit -s  import re   re match  hello    hello world    1000000 loops  best of 3  0 661 usec per loop    python -m timeit -s  import re  h re compile  hello     h match  hello world    1000000 loops  best of 3  0 285 usec per loop    python -m timeit -s  import re   h re compile  hello    h match  hello world    1000000 loops  best of 3  0 65 usec per loop    python --version Python 3 6 5    Anaconda  Inc    I also added a case  notice the quotation mark differences between the last two runs  that shows that re match x       is literally  roughly  equivalent to re compile x  match       i e  no behind-the-scenes caching of the compiled representation seems to happen

User · Answer

Interestingly  compiling does prove more efficient for me  Python 2 5 2 on Win XP    import re import time  rgx   re compile    w   s  0-9    s  w    str    average    2 never  a   0  t   time time    for i in xrange 1000000       if re match    w   s  0-9    s  w    str          if rgx match str           a    1  print time time   - t   Running the above code once as is  and once with the two if lines commented the other way around  the compiled regex is twice as fast

User · Answer

Legibility cognitive load preference  To me  the main gain is that I only need to remember  and read  one form of the complicated regex API syntax - the  lt compiled pattern gt  method xxx  form rather than that and the re func  lt pattern gt   xxx  form   The re compile  lt pattern gt   is a bit of extra boilerplate  true     But where regex are concerned  that extra compile step is unlikely to be a big cause of cognitive load   And in fact  on complicated patterns  you might even gain clarity from separating the declaration from whatever regex method you then invoke on it   I tend to first tune complicated patterns in a website like Regex101  or even in a separate minimal test script  then bring them into my code  so separating the declaration from its use fits my workflow as well

User · Answer

I ve had a lot of experience running a compiled regex 1000s   of times versus compiling on-the-fly  and have not noticed   any perceivable difference   The votes on the accepted answer leads to the assumption that what  Triptych says is true for all cases  This is not necessarily true  One big difference is when you have to decide whether to accept a regex string or a compiled regex object as a parameter to a function    gt  gt  gt  timeit timeit setup         import re     f lambda x  y  x match y          accepts compiled regex as parameter     h re compile  hello            stmt  f h   hello world     0 32881879806518555  gt  gt  gt  timeit timeit setup         import re     f lambda x  y  re compile x  match y      compiles when called          stmt  f  hello    hello world     0 809190034866333   It is always better to compile your regexs in case you need to reuse them    Note the example in the timeit above simulates creation of a compiled regex object once at import time versus  on-the-fly  when required for a match

User · Answer

Regular Expressions are compiled before being used when using the second version   If you are going to executing it many times it is definatly better to compile it first   If not compiling every time you match for one off s is fine

User · Answer

Regular Expressions are compiled before being used when using the second version   If you are going to executing it many times it is definatly better to compile it first   If not compiling every time you match for one off s is fine

User · Answer

According to the Python documentation  The sequence prog   re compile pattern  result   prog match string   is equivalent to result   re match pattern  string   but using re compile   and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program  So my conclusion is  if you are going to match the same pattern for many different texts  you better precompile it

User · Answer

I ve had a lot of experience running a compiled regex 1000s of times versus compiling on-the-fly  and have not noticed any perceivable difference   Obviously  this is anecdotal  and certainly not a great argument against compiling  but I ve found the difference to be negligible   EDIT  After a quick glance at the actual Python 2 5 library code  I see that Python internally compiles AND CACHES regexes whenever you use them anyway  including calls to re match     so you re really only changing WHEN the regex gets compiled  and shouldn t be saving much time at all - only the time it takes to check the cache  a key lookup on an internal dict type    From module re py  comments are mine    def match pattern  string  flags 0       return  compile pattern  flags  match string   def  compile  key          Does cache check at top of function     cachekey    type key 0       key     p    cache get cachekey      if p is not None  return p                  Does actual compilation on cache miss                  Caches compiled regex     if len  cache   gt    MAXCACHE           cache clear        cache cachekey    p     return p   I still often pre-compile regular expressions  but only to bind them to a nice  reusable name  not for any expected performance gain

User · Answer

Legibility cognitive load preference  To me  the main gain is that I only need to remember  and read  one form of the complicated regex API syntax - the  lt compiled pattern gt  method xxx  form rather than that and the re func  lt pattern gt   xxx  form   The re compile  lt pattern gt   is a bit of extra boilerplate  true     But where regex are concerned  that extra compile step is unlikely to be a big cause of cognitive load   And in fact  on complicated patterns  you might even gain clarity from separating the declaration from whatever regex method you then invoke on it   I tend to first tune complicated patterns in a website like Regex101  or even in a separate minimal test script  then bring them into my code  so separating the declaration from its use fits my workflow as well

User · Answer

For me  the biggest benefit to re compile is being able to separate definition of the regex from its use   Even a simple expression such as 0  1-9  0-9    integer in base 10 without leading zeros  can be complex enough that you d rather not have to retype it  check if you made any typos  and later have to recheck if there are typos when you start debugging  Plus  it s nicer to use a variable name such as num or num b10 than 0  1-9  0-9     It s certainly possible to store strings and pass them to re match  however  that s less readable   num           then  much later  m   re match num  input    Versus compiling   num   re compile          then  much later  m   num match input    Though it is fairly close  the last line of the second feels more natural and simpler when used repeatedly

User · Answer

According to the Python documentation  The sequence prog   re compile pattern  result   prog match string   is equivalent to result   re match pattern  string   but using re compile   and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program  So my conclusion is  if you are going to match the same pattern for many different texts  you better precompile it

User · Answer

I agree with Honest Abe that the match      in the given examples are different   They are not a one-to-one comparisons and thus  outcomes are vary   To simplify my reply  I use A  B  C  D for those functions in question   Oh yes  we are dealing with 4 functions in re py instead of 3   Running this piece of code   h   re compile  hello                        A  h match  hello world                         B    is same as running this code   re match  hello    hello world               C    Because  when looked into the source re py   A   B  means   h   re  compile  hello                       D  h match  hello world     and  C  is actually   re  compile  hello   match  hello world     So   C  is not the same as  B    In fact   C  calls  B  after calling  D  which is also called by  A    In other words   C     A     B    Therefore  comparing  A   B  inside a loop has same result as  C  inside a loop     George s regexTest py proved this for us   noncompiled took 4 555 seconds               C  in a loop compiledInLoop took 4 620 seconds            A   B  in a loop compiled took 2 323 seconds                  A  once    B  in a loop   Everyone s interest is  how to get the result of 2 323 seconds   In order to make sure compile      only get called once  we need to store the compiled regex object in memory   If we are using a class  we could store the object and reuse when every time our function get called   class Foo      regex   re compile  hello       def my function text          return regex match text    If we are not using class  which is my request today   then I have no comment   I m still learning to use global variable in Python  and I know global variable is a bad thing   One more point  I believe that using  A     B  approach has an upper hand   Here are some facts as I observed  please correct me if I m wrong     Calls A once  it will do one search in the  cache followed by one sre compile compile   to create a regex object   Calls A twice  it will do two searches and one compile  because the regex object is cached   If the  cache get flushed in between  then the regex object is released from memory and Python need to compile again   someone suggest that Python won t recompile   If we keep the regex object by using  A   the regex object will still get into  cache and get flushed somehow   But our code keep a reference on it and the regex object will not be released from memory   Those  Python need not to compile again  The 2 seconds differences in George s test compiledInLoop vs compiled is mainly the time required to build the key and search the  cache   It doesn t mean the compile time of regex  George s reallycompile test show what happen if it really re-do the compile every time  it will be 100x slower  he reduced the loop from 1 000 000 to 10 000     Here are the only cases that  A   B  is better than  C     If we can cache a reference of the regex object inside a class  If we need to calls  B  repeatedly  inside a loop or multiple times   we must cache the reference to regex object outside the loop    Case that  C  is good enough    We cannot cache a reference  We only use it once in a while  In overall  we don t have too many regex  assume the compiled one never get flushed    Just a recap  here are the A B C   h   re compile  hello                        A  h match  hello world                         B  re match  hello    hello world               C    Thanks for reading

User · Answer

In general  I find it is easier to use flags  at least easier to remember how   like re I when compiling patterns than to use flags inline    gt  gt  gt  foo pat   re compile  foo  re I   gt  gt  gt  foo pat findall  some string FoO bar     FoO     vs    gt  gt  gt  re findall    i foo   some string FoO bar     FoO

User · Answer

Here is an example where using re compile is over 50 times faster  as requested    The point is just the same as what I made in the comment above  namely  using re compile can be a significant advantage when your usage is such as to not benefit much from the compilation cache  This happens at least in one particular case  that I ran into in practice   namely when all of the following are true    You have a lot of regex patterns  more than re  MAXCACHE  whose default is currently 512   and you use these regexes a lot of times  and you consecutive usages of the same pattern are separated by more than re  MAXCACHE other regexes in between  so that each one gets flushed from the cache between consecutive usages    import re import time  def setup N 1000         Patterns  a  a    a  b         z  z      patterns    chr i           chr j                      for i in range ord  a    ord  z     1                      for j in range ord  a    ord  z     1         If this assertion below fails  just add more  distinct  patterns        assert re  MAXCACHE  lt  len patterns         N strings  Increase N for larger effect      strings     abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz     N     return  patterns  strings   def without compile        print  Without re compile        patterns  strings   setup       print  searching       count   0     for s in strings          for pat in patterns              count    bool re search pat  s       return count  def without compile cache friendly        print  Without re compile  cache-friendly order        patterns  strings   setup       print  searching       count   0     for pat in patterns          for s in strings              count    bool re search pat  s       return count  def with compile        print  With re compile        patterns  strings   setup       print  compiling       compiled    re compile pattern  for pattern in patterns      print  searching       count   0     for s in strings          for regex in compiled              count    bool regex search s       return count  start   time time   print with compile    d1   time time   - start print f -- That took  d1  2f  seconds  n    start   time time   print without compile cache friendly    d2   time time   - start print f -- That took  d2  2f  seconds  n    start   time time   print without compile    d3   time time   - start print f -- That took  d3  2f  seconds  n    print f Ratio   d3 d1  2f      Example output I get on my laptop  Python 3 7 7     With re compile  compiling searching 676000 -- That took 0 33 seconds   Without re compile  cache-friendly order  searching 676000 -- That took 0 67 seconds   Without re compile  searching 676000 -- That took 23 54 seconds   Ratio  70 89   I didn t bother with timeit as the difference is so stark  but I get qualitatively similar numbers each time  Note that even without re compile  using the same regex multiple times and moving on to the next one wasn t so bad  only about 2 times as slow as with re compile   but in the other order  looping through many regexes   it is significantly worse  as expected  Also  increasing the cache size works too  simply setting re  MAXCACHE   len patterns  in setup   above  of course I don t recommend doing such things in production as names with underscores are conventionally    private     drops the  23 seconds back down to  0 7 seconds  which also matches our understanding

User · Answer

As an alternative answer  as I see that it hasn t been mentioned before  I ll go ahead and quote the Python 3 docs      Should you use these module-level functions  or should you get the pattern and call its methods yourself  If you   re accessing a regex within a loop  pre-compiling it will save a few function calls  Outside of loops  there   s not much difference thanks to the internal cache

User · Answer

i d like to motivate that pre-compiling is both conceptually and  literately   as in  literate programming   advantageous  have a look at this code snippet   from re import compile as  Re  class TYPO     def text has foobar  self  text        return self  text has foobar re search  text   is not None    text has foobar re search    Re  r     i foobar      search  TYPO   TYPO     in your application  you d write   from TYPO import TYPO print  TYPO text has foobar   FOObar       this is about as simple in terms of functionality as it can get  because this is example is so short  i conflated the way to get  text has foobar re search all in one line  the disadvantage of this code is that it occupies a little memory for whatever the lifetime of the TYPO library object is  the advantage is that when doing a foobar search  you ll get away with two function calls and two class dictionary lookups  how many regexes are cached by re and the overhead of that cache are irrelevant here    compare this with the more usual style  below   import re  class Typo     def text has foobar  self  text        return re compile  r     i foobar      search  text   is not None   In the application   typo   Typo   print  typo text has foobar   FOObar       I readily admit that my style is highly unusual for python  maybe even debatable  however  in the example that more closely matches how python is mostly used  in order to do a single match  we must instantiate an object  do three instance dictionary lookups  and perform three function calls  additionally  we might get into re caching troubles when using more than 100 regexes  also  the regular expression gets hidden inside the method body  which most of the time is not such a good idea    be it said that every subset of measures---targeted  aliased import statements  aliased methods where applicable  reduction of function calls and object dictionary lookups---can help reduce computational and conceptual complexity

User · Answer

I ve had a lot of experience running a compiled regex 1000s of times versus compiling on-the-fly  and have not noticed any perceivable difference   Obviously  this is anecdotal  and certainly not a great argument against compiling  but I ve found the difference to be negligible   EDIT  After a quick glance at the actual Python 2 5 library code  I see that Python internally compiles AND CACHES regexes whenever you use them anyway  including calls to re match     so you re really only changing WHEN the regex gets compiled  and shouldn t be saving much time at all - only the time it takes to check the cache  a key lookup on an internal dict type    From module re py  comments are mine    def match pattern  string  flags 0       return  compile pattern  flags  match string   def  compile  key          Does cache check at top of function     cachekey    type key 0       key     p    cache get cachekey      if p is not None  return p                  Does actual compilation on cache miss                  Caches compiled regex     if len  cache   gt    MAXCACHE           cache clear        cache cachekey    p     return p   I still often pre-compile regular expressions  but only to bind them to a nice  reusable name  not for any expected performance gain

User · Answer

months later  it s easy to add your own cache around re match  or anything else for that matter --      Re py  Re match   re match   cache       efficiency  re py does this already  but what s  MAXCACHE        readability  inline   separate  matter of taste      import re  cache       re type   type  re compile         def match  pattern  str   opt            Re match   re match   cache re compile  pattern                if type pattern      re type          cpat   pattern     elif pattern in cache          cpat   cache pattern      else          cpat   cache pattern    re compile  pattern   opt       return cpat match  str      def search       A wibni  wouldn t it be nice if  cachehint  size     cacheinfo   -  size  hits  nclear

User · Answer

months later  it s easy to add your own cache around re match  or anything else for that matter --      Re py  Re match   re match   cache       efficiency  re py does this already  but what s  MAXCACHE        readability  inline   separate  matter of taste      import re  cache       re type   type  re compile         def match  pattern  str   opt            Re match   re match   cache re compile  pattern                if type pattern      re type          cpat   pattern     elif pattern in cache          cpat   cache pattern      else          cpat   cache pattern    re compile  pattern   opt       return cpat match  str      def search       A wibni  wouldn t it be nice if  cachehint  size     cacheinfo   -  size  hits  nclear

User · Answer

This is a good question  You often see people use re compile without reason  It lessens readability  But sure there are lots of times when pre-compiling the expression is called for  Like when you use it repeated times in a loop or some such   It s like everything about programming  everything in life actually   Apply common sense

User · Answer

This is a good question  You often see people use re compile without reason  It lessens readability  But sure there are lots of times when pre-compiling the expression is called for  Like when you use it repeated times in a loop or some such   It s like everything about programming  everything in life actually   Apply common sense

User · Answer

FWIW     python -m timeit -s  import re   re match  hello    hello world    100000 loops  best of 3  3 82 usec per loop    python -m timeit -s  import re  h re compile  hello     h match  hello world    1000000 loops  best of 3  1 26 usec per loop   so  if you re going to be using  the same regex a lot  it may be worth it to do re compile  especially for more complex regexes    The standard arguments against premature optimization apply  but I don t think you really lose much clarity straightforwardness by using re compile if you suspect that your regexps may become a performance bottleneck   Update   Under Python 3 6  I suspect the above timings were done using Python 2 x  and 2018 hardware  MacBook Pro   I now get the following timings     python -m timeit -s  import re   re match  hello    hello world    1000000 loops  best of 3  0 661 usec per loop    python -m timeit -s  import re  h re compile  hello     h match  hello world    1000000 loops  best of 3  0 285 usec per loop    python -m timeit -s  import re   h re compile  hello    h match  hello world    1000000 loops  best of 3  0 65 usec per loop    python --version Python 3 6 5    Anaconda  Inc    I also added a case  notice the quotation mark differences between the last two runs  that shows that re match x       is literally  roughly  equivalent to re compile x  match       i e  no behind-the-scenes caching of the compiled representation seems to happen

User · Answer

I agree with Honest Abe that the match      in the given examples are different   They are not a one-to-one comparisons and thus  outcomes are vary   To simplify my reply  I use A  B  C  D for those functions in question   Oh yes  we are dealing with 4 functions in re py instead of 3   Running this piece of code   h   re compile  hello                        A  h match  hello world                         B    is same as running this code   re match  hello    hello world               C    Because  when looked into the source re py   A   B  means   h   re  compile  hello                       D  h match  hello world     and  C  is actually   re  compile  hello   match  hello world     So   C  is not the same as  B    In fact   C  calls  B  after calling  D  which is also called by  A    In other words   C     A     B    Therefore  comparing  A   B  inside a loop has same result as  C  inside a loop     George s regexTest py proved this for us   noncompiled took 4 555 seconds               C  in a loop compiledInLoop took 4 620 seconds            A   B  in a loop compiled took 2 323 seconds                  A  once    B  in a loop   Everyone s interest is  how to get the result of 2 323 seconds   In order to make sure compile      only get called once  we need to store the compiled regex object in memory   If we are using a class  we could store the object and reuse when every time our function get called   class Foo      regex   re compile  hello       def my function text          return regex match text    If we are not using class  which is my request today   then I have no comment   I m still learning to use global variable in Python  and I know global variable is a bad thing   One more point  I believe that using  A     B  approach has an upper hand   Here are some facts as I observed  please correct me if I m wrong     Calls A once  it will do one search in the  cache followed by one sre compile compile   to create a regex object   Calls A twice  it will do two searches and one compile  because the regex object is cached   If the  cache get flushed in between  then the regex object is released from memory and Python need to compile again   someone suggest that Python won t recompile   If we keep the regex object by using  A   the regex object will still get into  cache and get flushed somehow   But our code keep a reference on it and the regex object will not be released from memory   Those  Python need not to compile again  The 2 seconds differences in George s test compiledInLoop vs compiled is mainly the time required to build the key and search the  cache   It doesn t mean the compile time of regex  George s reallycompile test show what happen if it really re-do the compile every time  it will be 100x slower  he reduced the loop from 1 000 000 to 10 000     Here are the only cases that  A   B  is better than  C     If we can cache a reference of the regex object inside a class  If we need to calls  B  repeatedly  inside a loop or multiple times   we must cache the reference to regex object outside the loop    Case that  C  is good enough    We cannot cache a reference  We only use it once in a while  In overall  we don t have too many regex  assume the compiled one never get flushed    Just a recap  here are the A B C   h   re compile  hello                        A  h match  hello world                         B  re match  hello    hello world               C    Thanks for reading

User · Answer

Mostly  there is little difference whether you use re compile or not   Internally  all of the functions are implemented in terms of a compile step   def match pattern  string  flags 0       return  compile pattern  flags  match string   def fullmatch pattern  string  flags 0       return  compile pattern  flags  fullmatch string   def search pattern  string  flags 0       return  compile pattern  flags  search string   def sub pattern  repl  string  count 0  flags 0       return  compile pattern  flags  sub repl  string  count   def subn pattern  repl  string  count 0  flags 0       return  compile pattern  flags  subn repl  string  count   def split pattern  string  maxsplit 0  flags 0       return  compile pattern  flags  split string  maxsplit   def findall pattern  string  flags 0       return  compile pattern  flags  findall string   def finditer pattern  string  flags 0       return  compile pattern  flags  finditer string    In addition  re compile   bypasses the extra indirection and caching logic    cache        pattern type   type sre compile compile     0     MAXCACHE   512 def  compile pattern  flags         internal  compile pattern     try          p  loc    cache type pattern   pattern  flags          if loc is None or loc     locale setlocale  locale LC CTYPE               return p     except KeyError          pass     if isinstance pattern   pattern type           if flags              raise ValueError                   cannot process flags argument with a compiled pattern           return pattern     if not sre compile isstring pattern           raise TypeError  first argument must be string or compiled pattern       p   sre compile compile pattern  flags      if not  flags  amp  DEBUG           if len  cache   gt    MAXCACHE               cache clear           if p flags  amp  LOCALE              if not  locale                  return p             loc    locale setlocale  locale LC CTYPE          else              loc   None          cache type pattern   pattern  flags    p  loc     return p   In addition to the small speed benefit from using re compile  people also like the readability that comes from naming potentially complex pattern specifications and separating them from the business logic where there are applied        Patterns                                                              number pattern   re compile r  d     d           Integer or decimal number assign pattern   re compile r                    Assignment operator identifier pattern   re compile r  A-Za-z        Identifiers whitespace pattern   re compile r   t            Spaces and tabs       Applications                                                           if whitespace pattern match s   business logic rule 1   if assign pattern match s   business logic rule 2     Note  one other respondent incorrectly believed that pyc files stored compiled patterns directly  however  in reality they are rebuilt each time when the PYC is loaded    gt  gt  gt  from dis import dis  gt  gt  gt  with open  tmp pyc    rb   as f          f read 8          dis marshal load f      1           0 LOAD CONST               0  -1                3 LOAD CONST               1  None                6 IMPORT NAME              0  re                9 STORE NAME               0  re     3          12 LOAD NAME                0  re               15 LOAD ATTR                1  compile               18 LOAD CONST               2    aeiou  2 5                 21 CALL FUNCTION            1              24 STORE NAME               2  lc vowels               27 LOAD CONST               1  None               30 RETURN VALUE   The above disassembly comes from the PYC file for a tmp py containing   import re lc vowels   re compile r  aeiou  2 5

User · Answer

My understanding is that those two examples are effectively equivalent  The only difference is that in the first  you can reuse the compiled regular expression elsewhere without causing it to be compiled again   Here s a reference for you  http   diveintopython3 ep io refactoring html     Calling the compiled pattern object s search function with the string  M  accomplishes the same thing as calling re search with both the regular expression and the string  M   Only much  much faster   In fact  the re search function simply compiles the regular expression and calls the resulting pattern object s search method for you

User · Answer

I just tried this myself  For the simple case of parsing a number out of a string and summing it  using a compiled regular expression object is about twice as fast as using the re methods   As others have pointed out  the re methods  including re compile  look up the regular expression string in a cache of previously compiled expressions  Therefore  in the normal case  the extra cost of using the re methods is simply the cost of the cache lookup   However  examination of the code  shows the cache is limited to 100 expressions  This begs the question  how painful is it to overflow the cache  The code contains an internal interface to the regular expression compiler  re sre compile compile  If we call it  we bypass the cache  It turns out to be about two orders of magnitude slower for a basic regular expression  such as r  w  s   0-9     s  w     Here s my test      usr bin env python import re import time  def timed func       def wrapper  args           t   time time           result   func  args          t   time time   - t         print   s took   3f seconds      func func name  t          return result     return wrapper  regularExpression   r  w  s   0-9     s  w   testString    average    2 never    timed def noncompiled        a   0     for x in xrange 1000000           m   re match regularExpression  testString          a    int m group 1       return a   timed def compiled        a   0     rgx   re compile regularExpression      for x in xrange 1000000           m   rgx match testString          a    int m group 1       return a   timed def reallyCompiled        a   0     rgx   re sre compile compile regularExpression      for x in xrange 1000000           m   rgx match testString          a    int m group 1       return a    timed def compiledInLoop        a   0     for x in xrange 1000000           rgx   re compile regularExpression          m   rgx match testString          a    int m group 1       return a   timed def reallyCompiledInLoop        a   0     for x in xrange 10000           rgx   re sre compile compile regularExpression          m   rgx match testString          a    int m group 1       return a  r1   noncompiled   r2   compiled   r3   reallyCompiled   r4   compiledInLoop   r5   reallyCompiledInLoop   print  r1      r1 print  r2      r2 print  r3      r3 print  r4      r4 print  r5      r5  lt  pre gt  And here is the output on my machine   lt pre gt    regexTest py  noncompiled took 4 555 seconds  compiled took 2 323 seconds  reallyCompiled took 2 325 seconds  compiledInLoop took 4 620 seconds  reallyCompiledInLoop took 4 074 seconds  r1    2000000 r2    2000000 r3    2000000 r4    2000000 r5    20000   The  reallyCompiled  methods use the internal interface  which bypasses the cache  Note the one that compiles on each loop iteration is only iterated 10 000 times  not one million

User · Answer

Besides the performance     Using compile helps me to distinguish the concepts of 1  module re   2  regex object 3  match object When I started learning regex   regex object regex object   re compile r  a-zA-Z      match object match object   regex object search  1 Hello    matching content match object group   output  Out 60    Hello  V S  re search r  a-zA-Z     1 Hello   group   Out 61    Hello    As a complement  I made an exhaustive cheatsheet of module re for your reference   regex      brackets    single character                 negate                      capturing group                                     backreferences and named group                 repetition                                   greedy v s  lazy        lookaround     lookahead                                          lookbehind         lt            lt                       caputuring        P lt name gt           P name                escapes    anchor                     b                    non printable         n     t     r     f     v               shorthand             d     w     s      methods      search    match    findall    finditer                    split    sub      match object     group   groups    groupdict   start    end    span

User · Answer

Performance difference aside  using re compile and using the compiled regular expression object to do match  whatever regular expression related operations  makes the semantics clearer to Python run-time   I had some painful experience of debugging some simple code   compare   lambda s  p  re match p  s    and later I d use compare in    x for x in data if compare patternPhrases  x columnIndex      where patternPhrases is supposed to be a variable containing regular expression string  x columnIndex  is a variable containing string   I had trouble that patternPhrases did not match some expected string   But if I used the re compile form   compare   lambda s  p  p match s    then in    x for x in data if compare patternPhrases  x columnIndex      Python would have complained that  string does not have attribute of match   as by positional argument mapping in compare  x columnIndex  is used as regular expression   when I actually meant  compare   lambda p  s  p match s    In my case  using re compile is more explicit of the purpose of regular expression  when it s value is hidden to naked eyes  thus I could get more help from Python run-time checking    So the moral of my lesson is that when the regular expression is not just literal string  then I should use re compile to let Python to help me to assert my assumption

User · Answer

I ve had a lot of experience running a compiled regex 1000s of times versus compiling on-the-fly  and have not noticed any perceivable difference   Obviously  this is anecdotal  and certainly not a great argument against compiling  but I ve found the difference to be negligible   EDIT  After a quick glance at the actual Python 2 5 library code  I see that Python internally compiles AND CACHES regexes whenever you use them anyway  including calls to re match     so you re really only changing WHEN the regex gets compiled  and shouldn t be saving much time at all - only the time it takes to check the cache  a key lookup on an internal dict type    From module re py  comments are mine    def match pattern  string  flags 0       return  compile pattern  flags  match string   def  compile  key          Does cache check at top of function     cachekey    type key 0       key     p    cache get cachekey      if p is not None  return p                  Does actual compilation on cache miss                  Caches compiled regex     if len  cache   gt    MAXCACHE           cache clear        cache cachekey    p     return p   I still often pre-compile regular expressions  but only to bind them to a nice  reusable name  not for any expected performance gain

User · Answer

I just tried this myself  For the simple case of parsing a number out of a string and summing it  using a compiled regular expression object is about twice as fast as using the re methods   As others have pointed out  the re methods  including re compile  look up the regular expression string in a cache of previously compiled expressions  Therefore  in the normal case  the extra cost of using the re methods is simply the cost of the cache lookup   However  examination of the code  shows the cache is limited to 100 expressions  This begs the question  how painful is it to overflow the cache  The code contains an internal interface to the regular expression compiler  re sre compile compile  If we call it  we bypass the cache  It turns out to be about two orders of magnitude slower for a basic regular expression  such as r  w  s   0-9     s  w     Here s my test      usr bin env python import re import time  def timed func       def wrapper  args           t   time time           result   func  args          t   time time   - t         print   s took   3f seconds      func func name  t          return result     return wrapper  regularExpression   r  w  s   0-9     s  w   testString    average    2 never    timed def noncompiled        a   0     for x in xrange 1000000           m   re match regularExpression  testString          a    int m group 1       return a   timed def compiled        a   0     rgx   re compile regularExpression      for x in xrange 1000000           m   rgx match testString          a    int m group 1       return a   timed def reallyCompiled        a   0     rgx   re sre compile compile regularExpression      for x in xrange 1000000           m   rgx match testString          a    int m group 1       return a    timed def compiledInLoop        a   0     for x in xrange 1000000           rgx   re compile regularExpression          m   rgx match testString          a    int m group 1       return a   timed def reallyCompiledInLoop        a   0     for x in xrange 10000           rgx   re sre compile compile regularExpression          m   rgx match testString          a    int m group 1       return a  r1   noncompiled   r2   compiled   r3   reallyCompiled   r4   compiledInLoop   r5   reallyCompiledInLoop   print  r1      r1 print  r2      r2 print  r3      r3 print  r4      r4 print  r5      r5  lt  pre gt  And here is the output on my machine   lt pre gt    regexTest py  noncompiled took 4 555 seconds  compiled took 2 323 seconds  reallyCompiled took 2 325 seconds  compiledInLoop took 4 620 seconds  reallyCompiledInLoop took 4 074 seconds  r1    2000000 r2    2000000 r3    2000000 r4    2000000 r5    20000   The  reallyCompiled  methods use the internal interface  which bypasses the cache  Note the one that compiles on each loop iteration is only iterated 10 000 times  not one million

User · Answer

Here s a simple test case      for x in 1 10 100 1000 10000 100000 1000000  do python -m timeit -n  x -s  import re   re match   0-9  3 - 0-9  3 - 0-9  4     123-123-1234     done 1 loops  best of 3  3 1 usec per loop 10 loops  best of 3  2 41 usec per loop 100 loops  best of 3  2 24 usec per loop 1000 loops  best of 3  2 21 usec per loop 10000 loops  best of 3  2 23 usec per loop 100000 loops  best of 3  2 24 usec per loop 1000000 loops  best of 3  2 31 usec per loop   with re compile          for x in 1 10 100 1000 10000 100000 1000000  do python -m timeit -n  x -s  import re   r   re compile   0-9  3 - 0-9  3 - 0-9  4      r match  123-123-1234     done 1 loops  best of 3  1 91 usec per loop 10 loops  best of 3  0 691 usec per loop 100 loops  best of 3  0 701 usec per loop 1000 loops  best of 3  0 684 usec per loop 10000 loops  best of 3  0 682 usec per loop 100000 loops  best of 3  0 694 usec per loop 1000000 loops  best of 3  0 702 usec per loop   So  it would seem to compiling is faster with this simple case  even if you only match once

User · Answer

There is one addition perk of using re compile    in the form of adding comments to my regex patterns using re VERBOSE  pattern       hello   world      Some info on my pattern logic      to recognize space      re search pattern   hello world   re VERBOSE    Although this does not affect the speed of running your code  I like to do it this way as it is part of my commenting habit  I throughly dislike spending time trying to remember the logic that went behind my code 2 months down the line when I want to make modifications

User · Answer

Regular Expressions are compiled before being used when using the second version   If you are going to executing it many times it is definatly better to compile it first   If not compiling every time you match for one off s is fine

User · Answer

Interestingly  compiling does prove more efficient for me  Python 2 5 2 on Win XP    import re import time  rgx   re compile    w   s  0-9    s  w    str    average    2 never  a   0  t   time time    for i in xrange 1000000       if re match    w   s  0-9    s  w    str          if rgx match str           a    1  print time time   - t   Running the above code once as is  and once with the two if lines commented the other way around  the compiled regex is twice as fast

User · Answer

Interestingly  compiling does prove more efficient for me  Python 2 5 2 on Win XP    import re import time  rgx   re compile    w   s  0-9    s  w    str    average    2 never  a   0  t   time time    for i in xrange 1000000       if re match    w   s  0-9    s  w    str          if rgx match str           a    1  print time time   - t   Running the above code once as is  and once with the two if lines commented the other way around  the compiled regex is twice as fast

User · Answer

I really respect all the above answers  From my opinion Yes  For sure it is worth to use re compile instead of compiling the regex  again and again  every time       Using re compile makes your code more dynamic  as you can call the already compiled regex  instead of compiling again and aagain  This thing benefits you in cases     Processor Efforts Time Complexity  Makes regex Universal  can be used in findall  search  match  And makes your program looks cool    Example      example string    The room number of her room is 26A7B     find alpha numeric string   re compile r  b w  b     Using in Findall   find alpha numeric string findall example string    Using in search    find alpha numeric string search example string            Similarly you can use it for  Match and Substitute

User · Answer

I really respect all the above answers  From my opinion Yes  For sure it is worth to use re compile instead of compiling the regex  again and again  every time       Using re compile makes your code more dynamic  as you can call the already compiled regex  instead of compiling again and aagain  This thing benefits you in cases     Processor Efforts Time Complexity  Makes regex Universal  can be used in findall  search  match  And makes your program looks cool    Example      example string    The room number of her room is 26A7B     find alpha numeric string   re compile r  b w  b     Using in Findall   find alpha numeric string findall example string    Using in search    find alpha numeric string search example string            Similarly you can use it for  Match and Substitute

User · Answer

Besides the performance     Using compile helps me to distinguish the concepts of 1  module re   2  regex object 3  match object When I started learning regex   regex object regex object   re compile r  a-zA-Z      match object match object   regex object search  1 Hello    matching content match object group   output  Out 60    Hello  V S  re search r  a-zA-Z     1 Hello   group   Out 61    Hello    As a complement  I made an exhaustive cheatsheet of module re for your reference   regex      brackets    single character                 negate                      capturing group                                     backreferences and named group                 repetition                                   greedy v s  lazy        lookaround     lookahead                                          lookbehind         lt            lt                       caputuring        P lt name gt           P name                escapes    anchor                     b                    non printable         n     t     r     f     v               shorthand             d     w     s      methods      search    match    findall    finditer                    split    sub      match object     group   groups    groupdict   start    end    span

User · Answer

Using the given examples   h   re compile  hello   h match  hello world     The match method in the example above is not the same as the one used below   re match  hello    hello world     re compile   returns a regular expression object  which means h is a regex object   The regex object has its own match method with the optional pos and endpos parameters   regex match string   pos   endpos       pos     The optional second parameter pos gives an index in the string where   the search is to start  it defaults to 0  This is not completely   equivalent to slicing the string  the     pattern character matches at   the real beginning of the string and at positions just after a   newline  but not necessarily at the index where the search is to   start    endpos     The optional parameter endpos limits how far the string will be   searched  it will be as if the string is endpos characters long  so   only the characters from pos to endpos - 1 will be searched for a   match  If endpos is less than pos  no match will be found  otherwise    if rx is a compiled regular expression object  rx search string  0    50  is equivalent to rx search string  50   0     The regex object s search  findall  and finditer methods also support these parameters   re match pattern  string  flags 0  does not support them as you can see  nor does its search  findall  and finditer counterparts   A match object has attributes that complement these parameters   match pos     The value of pos which was passed to the search   or match   method of   a regex object  This is the index into the string at which the RE   engine started looking for a match    match endpos     The value of endpos which was passed to the search   or match   method   of a regex object  This is the index into the string beyond which the   RE engine will not go      A regex object has two unique  possibly useful  attributes   regex groups     The number of capturing groups in the pattern    regex groupindex     A dictionary mapping any symbolic group names defined by   P  to   group numbers  The dictionary is empty if no symbolic groups were used   in the pattern      And finally  a match object has this attribute   match re     The regular expression object whose match   or search   method   produced this match instance

User · Answer

For me  the biggest benefit to re compile is being able to separate definition of the regex from its use   Even a simple expression such as 0  1-9  0-9    integer in base 10 without leading zeros  can be complex enough that you d rather not have to retype it  check if you made any typos  and later have to recheck if there are typos when you start debugging  Plus  it s nicer to use a variable name such as num or num b10 than 0  1-9  0-9     It s certainly possible to store strings and pass them to re match  however  that s less readable   num           then  much later  m   re match num  input    Versus compiling   num   re compile          then  much later  m   num match input    Though it is fairly close  the last line of the second feels more natural and simpler when used repeatedly

User · Answer

Here is an example where using re compile is over 50 times faster  as requested    The point is just the same as what I made in the comment above  namely  using re compile can be a significant advantage when your usage is such as to not benefit much from the compilation cache  This happens at least in one particular case  that I ran into in practice   namely when all of the following are true    You have a lot of regex patterns  more than re  MAXCACHE  whose default is currently 512   and you use these regexes a lot of times  and you consecutive usages of the same pattern are separated by more than re  MAXCACHE other regexes in between  so that each one gets flushed from the cache between consecutive usages    import re import time  def setup N 1000         Patterns  a  a    a  b         z  z      patterns    chr i           chr j                      for i in range ord  a    ord  z     1                      for j in range ord  a    ord  z     1         If this assertion below fails  just add more  distinct  patterns        assert re  MAXCACHE  lt  len patterns         N strings  Increase N for larger effect      strings     abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz     N     return  patterns  strings   def without compile        print  Without re compile        patterns  strings   setup       print  searching       count   0     for s in strings          for pat in patterns              count    bool re search pat  s       return count  def without compile cache friendly        print  Without re compile  cache-friendly order        patterns  strings   setup       print  searching       count   0     for pat in patterns          for s in strings              count    bool re search pat  s       return count  def with compile        print  With re compile        patterns  strings   setup       print  compiling       compiled    re compile pattern  for pattern in patterns      print  searching       count   0     for s in strings          for regex in compiled              count    bool regex search s       return count  start   time time   print with compile    d1   time time   - start print f -- That took  d1  2f  seconds  n    start   time time   print without compile cache friendly    d2   time time   - start print f -- That took  d2  2f  seconds  n    start   time time   print without compile    d3   time time   - start print f -- That took  d3  2f  seconds  n    print f Ratio   d3 d1  2f      Example output I get on my laptop  Python 3 7 7     With re compile  compiling searching 676000 -- That took 0 33 seconds   Without re compile  cache-friendly order  searching 676000 -- That took 0 67 seconds   Without re compile  searching 676000 -- That took 23 54 seconds   Ratio  70 89   I didn t bother with timeit as the difference is so stark  but I get qualitatively similar numbers each time  Note that even without re compile  using the same regex multiple times and moving on to the next one wasn t so bad  only about 2 times as slow as with re compile   but in the other order  looping through many regexes   it is significantly worse  as expected  Also  increasing the cache size works too  simply setting re  MAXCACHE   len patterns  in setup   above  of course I don t recommend doing such things in production as names with underscores are conventionally    private     drops the  23 seconds back down to  0 7 seconds  which also matches our understanding

User · Answer

For me  the biggest benefit to re compile is being able to separate definition of the regex from its use   Even a simple expression such as 0  1-9  0-9    integer in base 10 without leading zeros  can be complex enough that you d rather not have to retype it  check if you made any typos  and later have to recheck if there are typos when you start debugging  Plus  it s nicer to use a variable name such as num or num b10 than 0  1-9  0-9     It s certainly possible to store strings and pass them to re match  however  that s less readable   num           then  much later  m   re match num  input    Versus compiling   num   re compile          then  much later  m   num match input    Though it is fairly close  the last line of the second feels more natural and simpler when used repeatedly

User · Answer

My understanding is that those two examples are effectively equivalent  The only difference is that in the first  you can reuse the compiled regular expression elsewhere without causing it to be compiled again   Here s a reference for you  http   diveintopython3 ep io refactoring html     Calling the compiled pattern object s search function with the string  M  accomplishes the same thing as calling re search with both the regular expression and the string  M   Only much  much faster   In fact  the re search function simply compiles the regular expression and calls the resulting pattern object s search method for you

User · Answer

I ve had a lot of experience running a compiled regex 1000s   of times versus compiling on-the-fly  and have not noticed   any perceivable difference   The votes on the accepted answer leads to the assumption that what  Triptych says is true for all cases  This is not necessarily true  One big difference is when you have to decide whether to accept a regex string or a compiled regex object as a parameter to a function    gt  gt  gt  timeit timeit setup         import re     f lambda x  y  x match y          accepts compiled regex as parameter     h re compile  hello            stmt  f h   hello world     0 32881879806518555  gt  gt  gt  timeit timeit setup         import re     f lambda x  y  re compile x  match y      compiles when called          stmt  f  hello    hello world     0 809190034866333   It is always better to compile your regexs in case you need to reuse them    Note the example in the timeit above simulates creation of a compiled regex object once at import time versus  on-the-fly  when required for a match

User · Answer

This is a good question  You often see people use re compile without reason  It lessens readability  But sure there are lots of times when pre-compiling the expression is called for  Like when you use it repeated times in a loop or some such   It s like everything about programming  everything in life actually   Apply common sense

User · Answer

My understanding is that those two examples are effectively equivalent  The only difference is that in the first  you can reuse the compiled regular expression elsewhere without causing it to be compiled again   Here s a reference for you  http   diveintopython3 ep io refactoring html     Calling the compiled pattern object s search function with the string  M  accomplishes the same thing as calling re search with both the regular expression and the string  M   Only much  much faster   In fact  the re search function simply compiles the regular expression and calls the resulting pattern object s search method for you

User · Answer

This answer might be arriving late but is an interesting find  Using compile can really save you time if you are planning on using the regex multiple times  this is also mentioned in the docs   Below you can see that using a compiled regex is the fastest when the match method is directly called on it  passing a compiled regex to re match makes it even slower and passing re match with the patter string is somewhere in the middle     gt  gt  gt  ipr   r  D     0-2  0-5   0-5       3   0-2  0-5   0-5     D    gt  gt  gt  average  timeit repeat  re match ipr   abcd100 10 255 255      globals   ipr   ipr   re   re    1 5077415757028423  gt  gt  gt  ipr   re compile ipr   gt  gt  gt  average  timeit repeat  re match ipr   abcd100 10 255 255      globals   ipr   ipr   re   re    1 8324008992184038  gt  gt  gt  average  timeit repeat  ipr match  abcd100 10 255 255      globals   ipr   ipr   re   re    0 9187896518778871

User · Answer

I ve had a lot of experience running a compiled regex 1000s of times versus compiling on-the-fly  and have not noticed any perceivable difference   Obviously  this is anecdotal  and certainly not a great argument against compiling  but I ve found the difference to be negligible   EDIT  After a quick glance at the actual Python 2 5 library code  I see that Python internally compiles AND CACHES regexes whenever you use them anyway  including calls to re match     so you re really only changing WHEN the regex gets compiled  and shouldn t be saving much time at all - only the time it takes to check the cache  a key lookup on an internal dict type    From module re py  comments are mine    def match pattern  string  flags 0       return  compile pattern  flags  match string   def  compile  key          Does cache check at top of function     cachekey    type key 0       key     p    cache get cachekey      if p is not None  return p                  Does actual compilation on cache miss                  Caches compiled regex     if len  cache   gt    MAXCACHE           cache clear        cache cachekey    p     return p   I still often pre-compile regular expressions  but only to bind them to a nice  reusable name  not for any expected performance gain

User · Answer

Using the given examples   h   re compile  hello   h match  hello world     The match method in the example above is not the same as the one used below   re match  hello    hello world     re compile   returns a regular expression object  which means h is a regex object   The regex object has its own match method with the optional pos and endpos parameters   regex match string   pos   endpos       pos     The optional second parameter pos gives an index in the string where   the search is to start  it defaults to 0  This is not completely   equivalent to slicing the string  the     pattern character matches at   the real beginning of the string and at positions just after a   newline  but not necessarily at the index where the search is to   start    endpos     The optional parameter endpos limits how far the string will be   searched  it will be as if the string is endpos characters long  so   only the characters from pos to endpos - 1 will be searched for a   match  If endpos is less than pos  no match will be found  otherwise    if rx is a compiled regular expression object  rx search string  0    50  is equivalent to rx search string  50   0     The regex object s search  findall  and finditer methods also support these parameters   re match pattern  string  flags 0  does not support them as you can see  nor does its search  findall  and finditer counterparts   A match object has attributes that complement these parameters   match pos     The value of pos which was passed to the search   or match   method of   a regex object  This is the index into the string at which the RE   engine started looking for a match    match endpos     The value of endpos which was passed to the search   or match   method   of a regex object  This is the index into the string beyond which the   RE engine will not go      A regex object has two unique  possibly useful  attributes   regex groups     The number of capturing groups in the pattern    regex groupindex     A dictionary mapping any symbolic group names defined by   P  to   group numbers  The dictionary is empty if no symbolic groups were used   in the pattern      And finally  a match object has this attribute   match re     The regular expression object whose match   or search   method   produced this match instance

User · Answer

FWIW     python -m timeit -s  import re   re match  hello    hello world    100000 loops  best of 3  3 82 usec per loop    python -m timeit -s  import re  h re compile  hello     h match  hello world    1000000 loops  best of 3  1 26 usec per loop   so  if you re going to be using  the same regex a lot  it may be worth it to do re compile  especially for more complex regexes    The standard arguments against premature optimization apply  but I don t think you really lose much clarity straightforwardness by using re compile if you suspect that your regexps may become a performance bottleneck   Update   Under Python 3 6  I suspect the above timings were done using Python 2 x  and 2018 hardware  MacBook Pro   I now get the following timings     python -m timeit -s  import re   re match  hello    hello world    1000000 loops  best of 3  0 661 usec per loop    python -m timeit -s  import re  h re compile  hello     h match  hello world    1000000 loops  best of 3  0 285 usec per loop    python -m timeit -s  import re   h re compile  hello    h match  hello world    1000000 loops  best of 3  0 65 usec per loop    python --version Python 3 6 5    Anaconda  Inc    I also added a case  notice the quotation mark differences between the last two runs  that shows that re match x       is literally  roughly  equivalent to re compile x  match       i e  no behind-the-scenes caching of the compiled representation seems to happen

User · Answer

I ran this test before stumbling upon the discussion here   However  having run it I thought I d at least post my results   I stole and bastardized the example in Jeff Friedl s  Mastering Regular Expressions    This is on a macbook running OSX 10 6  2Ghz intel core 2 duo  4GB ram    Python version is 2 6 1   Run 1 - using re compile  import re  import time  import fpformat Regex1   re compile    a b c d e f g       Regex2   re compile    a-g      TimesToDo   1000 TestString       for i in range 1000       TestString     abababdedfg  StartTime   time time    for i in range TimesToDo       Regex1 search TestString   Seconds   time time   - StartTime  print  Alternation takes     fpformat fix Seconds 3      seconds   StartTime   time time    for i in range TimesToDo       Regex2 search TestString   Seconds   time time   - StartTime  print  Character Class takes     fpformat fix Seconds 3      seconds   Alternation takes 2 299 seconds Character Class takes 0 107 seconds   Run 2 - Not using re compile  import re  import time  import fpformat  TimesToDo   1000 TestString       for i in range 1000       TestString     abababdedfg  StartTime   time time    for i in range TimesToDo       re search    a b c d e f g     TestString   Seconds   time time   - StartTime  print  Alternation takes     fpformat fix Seconds 3      seconds   StartTime   time time    for i in range TimesToDo       re search    a-g     TestString   Seconds   time time   - StartTime  print  Character Class takes     fpformat fix Seconds 3      seconds   Alternation takes 2 508 seconds Character Class takes 0 109 seconds

User · Answer

I ran this test before stumbling upon the discussion here   However  having run it I thought I d at least post my results   I stole and bastardized the example in Jeff Friedl s  Mastering Regular Expressions    This is on a macbook running OSX 10 6  2Ghz intel core 2 duo  4GB ram    Python version is 2 6 1   Run 1 - using re compile  import re  import time  import fpformat Regex1   re compile    a b c d e f g       Regex2   re compile    a-g      TimesToDo   1000 TestString       for i in range 1000       TestString     abababdedfg  StartTime   time time    for i in range TimesToDo       Regex1 search TestString   Seconds   time time   - StartTime  print  Alternation takes     fpformat fix Seconds 3      seconds   StartTime   time time    for i in range TimesToDo       Regex2 search TestString   Seconds   time time   - StartTime  print  Character Class takes     fpformat fix Seconds 3      seconds   Alternation takes 2 299 seconds Character Class takes 0 107 seconds   Run 2 - Not using re compile  import re  import time  import fpformat  TimesToDo   1000 TestString       for i in range 1000       TestString     abababdedfg  StartTime   time time    for i in range TimesToDo       re search    a b c d e f g     TestString   Seconds   time time   - StartTime  print  Alternation takes     fpformat fix Seconds 3      seconds   StartTime   time time    for i in range TimesToDo       re search    a-g     TestString   Seconds   time time   - StartTime  print  Character Class takes     fpformat fix Seconds 3      seconds   Alternation takes 2 508 seconds Character Class takes 0 109 seconds

User · Answer

My understanding is that those two examples are effectively equivalent  The only difference is that in the first  you can reuse the compiled regular expression elsewhere without causing it to be compiled again   Here s a reference for you  http   diveintopython3 ep io refactoring html     Calling the compiled pattern object s search function with the string  M  accomplishes the same thing as calling re search with both the regular expression and the string  M   Only much  much faster   In fact  the re search function simply compiles the regular expression and calls the resulting pattern object s search method for you

User · Answer

FWIW     python -m timeit -s  import re   re match  hello    hello world    100000 loops  best of 3  3 82 usec per loop    python -m timeit -s  import re  h re compile  hello     h match  hello world    1000000 loops  best of 3  1 26 usec per loop   so  if you re going to be using  the same regex a lot  it may be worth it to do re compile  especially for more complex regexes    The standard arguments against premature optimization apply  but I don t think you really lose much clarity straightforwardness by using re compile if you suspect that your regexps may become a performance bottleneck   Update   Under Python 3 6  I suspect the above timings were done using Python 2 x  and 2018 hardware  MacBook Pro   I now get the following timings     python -m timeit -s  import re   re match  hello    hello world    1000000 loops  best of 3  0 661 usec per loop    python -m timeit -s  import re  h re compile  hello     h match  hello world    1000000 loops  best of 3  0 285 usec per loop    python -m timeit -s  import re   h re compile  hello    h match  hello world    1000000 loops  best of 3  0 65 usec per loop    python --version Python 3 6 5    Anaconda  Inc    I also added a case  notice the quotation mark differences between the last two runs  that shows that re match x       is literally  roughly  equivalent to re compile x  match       i e  no behind-the-scenes caching of the compiled representation seems to happen

User · Answer

As an alternative answer  as I see that it hasn t been mentioned before  I ll go ahead and quote the Python 3 docs      Should you use these module-level functions  or should you get the pattern and call its methods yourself  If you   re accessing a regex within a loop  pre-compiling it will save a few function calls  Outside of loops  there   s not much difference thanks to the internal cache

User · Answer

i d like to motivate that pre-compiling is both conceptually and  literately   as in  literate programming   advantageous  have a look at this code snippet   from re import compile as  Re  class TYPO     def text has foobar  self  text        return self  text has foobar re search  text   is not None    text has foobar re search    Re  r     i foobar      search  TYPO   TYPO     in your application  you d write   from TYPO import TYPO print  TYPO text has foobar   FOObar       this is about as simple in terms of functionality as it can get  because this is example is so short  i conflated the way to get  text has foobar re search all in one line  the disadvantage of this code is that it occupies a little memory for whatever the lifetime of the TYPO library object is  the advantage is that when doing a foobar search  you ll get away with two function calls and two class dictionary lookups  how many regexes are cached by re and the overhead of that cache are irrelevant here    compare this with the more usual style  below   import re  class Typo     def text has foobar  self  text        return re compile  r     i foobar      search  text   is not None   In the application   typo   Typo   print  typo text has foobar   FOObar       I readily admit that my style is highly unusual for python  maybe even debatable  however  in the example that more closely matches how python is mostly used  in order to do a single match  we must instantiate an object  do three instance dictionary lookups  and perform three function calls  additionally  we might get into re caching troubles when using more than 100 regexes  also  the regular expression gets hidden inside the method body  which most of the time is not such a good idea    be it said that every subset of measures---targeted  aliased import statements  aliased methods where applicable  reduction of function calls and object dictionary lookups---can help reduce computational and conceptual complexity

User · Answer

For me  the biggest benefit to re compile is being able to separate definition of the regex from its use   Even a simple expression such as 0  1-9  0-9    integer in base 10 without leading zeros  can be complex enough that you d rather not have to retype it  check if you made any typos  and later have to recheck if there are typos when you start debugging  Plus  it s nicer to use a variable name such as num or num b10 than 0  1-9  0-9     It s certainly possible to store strings and pass them to re match  however  that s less readable   num           then  much later  m   re match num  input    Versus compiling   num   re compile          then  much later  m   num match input    Though it is fairly close  the last line of the second feels more natural and simpler when used repeatedly

User · Answer

Performance difference aside  using re compile and using the compiled regular expression object to do match  whatever regular expression related operations  makes the semantics clearer to Python run-time   I had some painful experience of debugging some simple code   compare   lambda s  p  re match p  s    and later I d use compare in    x for x in data if compare patternPhrases  x columnIndex      where patternPhrases is supposed to be a variable containing regular expression string  x columnIndex  is a variable containing string   I had trouble that patternPhrases did not match some expected string   But if I used the re compile form   compare   lambda s  p  p match s    then in    x for x in data if compare patternPhrases  x columnIndex      Python would have complained that  string does not have attribute of match   as by positional argument mapping in compare  x columnIndex  is used as regular expression   when I actually meant  compare   lambda p  s  p match s    In my case  using re compile is more explicit of the purpose of regular expression  when it s value is hidden to naked eyes  thus I could get more help from Python run-time checking    So the moral of my lesson is that when the regular expression is not just literal string  then I should use re compile to let Python to help me to assert my assumption

User · Answer

In general  I find it is easier to use flags  at least easier to remember how   like re I when compiling patterns than to use flags inline    gt  gt  gt  foo pat   re compile  foo  re I   gt  gt  gt  foo pat findall  some string FoO bar     FoO     vs    gt  gt  gt  re findall    i foo   some string FoO bar     FoO

User · Answer

Interestingly  compiling does prove more efficient for me  Python 2 5 2 on Win XP    import re import time  rgx   re compile    w   s  0-9    s  w    str    average    2 never  a   0  t   time time    for i in xrange 1000000       if re match    w   s  0-9    s  w    str          if rgx match str           a    1  print time time   - t   Running the above code once as is  and once with the two if lines commented the other way around  the compiled regex is twice as fast

User · Answer

This is a good question  You often see people use re compile without reason  It lessens readability  But sure there are lots of times when pre-compiling the expression is called for  Like when you use it repeated times in a loop or some such   It s like everything about programming  everything in life actually   Apply common sense

User · Answer

Mostly  there is little difference whether you use re compile or not   Internally  all of the functions are implemented in terms of a compile step   def match pattern  string  flags 0       return  compile pattern  flags  match string   def fullmatch pattern  string  flags 0       return  compile pattern  flags  fullmatch string   def search pattern  string  flags 0       return  compile pattern  flags  search string   def sub pattern  repl  string  count 0  flags 0       return  compile pattern  flags  sub repl  string  count   def subn pattern  repl  string  count 0  flags 0       return  compile pattern  flags  subn repl  string  count   def split pattern  string  maxsplit 0  flags 0       return  compile pattern  flags  split string  maxsplit   def findall pattern  string  flags 0       return  compile pattern  flags  findall string   def finditer pattern  string  flags 0       return  compile pattern  flags  finditer string    In addition  re compile   bypasses the extra indirection and caching logic    cache        pattern type   type sre compile compile     0     MAXCACHE   512 def  compile pattern  flags         internal  compile pattern     try          p  loc    cache type pattern   pattern  flags          if loc is None or loc     locale setlocale  locale LC CTYPE               return p     except KeyError          pass     if isinstance pattern   pattern type           if flags              raise ValueError                   cannot process flags argument with a compiled pattern           return pattern     if not sre compile isstring pattern           raise TypeError  first argument must be string or compiled pattern       p   sre compile compile pattern  flags      if not  flags  amp  DEBUG           if len  cache   gt    MAXCACHE               cache clear           if p flags  amp  LOCALE              if not  locale                  return p             loc    locale setlocale  locale LC CTYPE          else              loc   None          cache type pattern   pattern  flags    p  loc     return p   In addition to the small speed benefit from using re compile  people also like the readability that comes from naming potentially complex pattern specifications and separating them from the business logic where there are applied        Patterns                                                              number pattern   re compile r  d     d           Integer or decimal number assign pattern   re compile r                    Assignment operator identifier pattern   re compile r  A-Za-z        Identifiers whitespace pattern   re compile r   t            Spaces and tabs       Applications                                                           if whitespace pattern match s   business logic rule 1   if assign pattern match s   business logic rule 2     Note  one other respondent incorrectly believed that pyc files stored compiled patterns directly  however  in reality they are rebuilt each time when the PYC is loaded    gt  gt  gt  from dis import dis  gt  gt  gt  with open  tmp pyc    rb   as f          f read 8          dis marshal load f      1           0 LOAD CONST               0  -1                3 LOAD CONST               1  None                6 IMPORT NAME              0  re                9 STORE NAME               0  re     3          12 LOAD NAME                0  re               15 LOAD ATTR                1  compile               18 LOAD CONST               2    aeiou  2 5                 21 CALL FUNCTION            1              24 STORE NAME               2  lc vowels               27 LOAD CONST               1  None               30 RETURN VALUE   The above disassembly comes from the PYC file for a tmp py containing   import re lc vowels   re compile r  aeiou  2 5

User · Answer

This answer might be arriving late but is an interesting find  Using compile can really save you time if you are planning on using the regex multiple times  this is also mentioned in the docs   Below you can see that using a compiled regex is the fastest when the match method is directly called on it  passing a compiled regex to re match makes it even slower and passing re match with the patter string is somewhere in the middle     gt  gt  gt  ipr   r  D     0-2  0-5   0-5       3   0-2  0-5   0-5     D    gt  gt  gt  average  timeit repeat  re match ipr   abcd100 10 255 255      globals   ipr   ipr   re   re    1 5077415757028423  gt  gt  gt  ipr   re compile ipr   gt  gt  gt  average  timeit repeat  re match ipr   abcd100 10 255 255      globals   ipr   ipr   re   re    1 8324008992184038  gt  gt  gt  average  timeit repeat  ipr match  abcd100 10 255 255      globals   ipr   ipr   re   re    0 9187896518778871

User · Answer

There is one addition perk of using re compile    in the form of adding comments to my regex patterns using re VERBOSE  pattern       hello   world      Some info on my pattern logic      to recognize space      re search pattern   hello world   re VERBOSE    Although this does not affect the speed of running your code  I like to do it this way as it is part of my commenting habit  I throughly dislike spending time trying to remember the logic that went behind my code 2 months down the line when I want to make modifications

User · Answer

Here s a simple test case      for x in 1 10 100 1000 10000 100000 1000000  do python -m timeit -n  x -s  import re   re match   0-9  3 - 0-9  3 - 0-9  4     123-123-1234     done 1 loops  best of 3  3 1 usec per loop 10 loops  best of 3  2 41 usec per loop 100 loops  best of 3  2 24 usec per loop 1000 loops  best of 3  2 21 usec per loop 10000 loops  best of 3  2 23 usec per loop 100000 loops  best of 3  2 24 usec per loop 1000000 loops  best of 3  2 31 usec per loop   with re compile          for x in 1 10 100 1000 10000 100000 1000000  do python -m timeit -n  x -s  import re   r   re compile   0-9  3 - 0-9  3 - 0-9  4      r match  123-123-1234     done 1 loops  best of 3  1 91 usec per loop 10 loops  best of 3  0 691 usec per loop 100 loops  best of 3  0 701 usec per loop 1000 loops  best of 3  0 684 usec per loop 10000 loops  best of 3  0 682 usec per loop 100000 loops  best of 3  0 694 usec per loop 1000000 loops  best of 3  0 702 usec per loop   So  it would seem to compiling is faster with this simple case  even if you only match once

User · Answer

Regular Expressions are compiled before being used when using the second version   If you are going to executing it many times it is definatly better to compile it first   If not compiling every time you match for one off s is fine

[python] Is it worth using Python's re.compile?

Examples related to python

Examples related to regex