Fastest way to check if a string matches a regexp in ruby

Question

What is the fastest way to check if a string matches a regular expression in Ruby   My problem is that I have to  egrep  through a huge list of strings to find which are the ones that match a regexp that is given at runtime  I only care about whether the string matches the regexp  not where it matches  nor what the content of the matching groups is  I hope this assumption can be used to reduce the amount of time my code spend matching regexps   I load the regexp with  pattern   Regexp new ptx  freeze   I have found that string    pattern is slightly faster than string match pattern    Are there other tricks or shortcuts that can used to make this test even faster

User · Answer

This is the benchmark I have run after finding some articles around the net.

With 2.4.0 the winner is re.match?(str) (as suggested by @wiktor-stribizew), on previous versions, re =~ str seems to be fastest, although str =~ re is almost as fast.

#!/usr/bin/env ruby
require 'benchmark'

str = "aacaabc"
re = Regexp.new('a+b').freeze

N = 4_000_000

Benchmark.bm do |b|
    b.report("str.match re\t") { N.times { str.match re } }
    b.report("str =~ re\t")    { N.times { str =~ re } }
    b.report("str[re]  \t")    { N.times { str[re] } }
    b.report("re =~ str\t")    { N.times { re =~ str } }
    b.report("re.match str\t") { N.times { re.match str } }
    if re.respond_to?(:match?)
        b.report("re.match? str\t") { N.times { re.match? str } }
    end
end

Results MRI 1.9.3-o551:

$ ./bench-re.rb  | sort -t $'\t' -k 2
       user     system      total        real
re =~ str         2.390000   0.000000   2.390000 (  2.397331)
str =~ re         2.450000   0.000000   2.450000 (  2.446893)
str[re]           2.940000   0.010000   2.950000 (  2.941666)
re.match str      3.620000   0.000000   3.620000 (  3.619922)
str.match re      4.180000   0.000000   4.180000 (  4.180083)

Results MRI 2.1.5:

$ ./bench-re.rb  | sort -t $'\t' -k 2
       user     system      total        real
re =~ str         1.150000   0.000000   1.150000 (  1.144880)
str =~ re         1.160000   0.000000   1.160000 (  1.150691)
str[re]           1.330000   0.000000   1.330000 (  1.337064)
re.match str      2.250000   0.000000   2.250000 (  2.255142)
str.match re      2.270000   0.000000   2.270000 (  2.270948)

Results MRI 2.3.3 (there is a regression in regex matching, it seems):

$ ./bench-re.rb  | sort -t $'\t' -k 2
       user     system      total        real
re =~ str         3.540000   0.000000   3.540000 (  3.535881)
str =~ re         3.560000   0.000000   3.560000 (  3.560657)
str[re]           4.300000   0.000000   4.300000 (  4.299403)
re.match str      5.210000   0.010000   5.220000 (  5.213041)
str.match re      6.000000   0.000000   6.000000 (  6.000465)

Results MRI 2.4.0:

$ ./bench-re.rb  | sort -t $'\t' -k 2
       user     system      total        real
re.match? str     0.690000   0.010000   0.700000 (  0.682934)
re =~ str         1.040000   0.000000   1.040000 (  1.035863)
str =~ re         1.040000   0.000000   1.040000 (  1.042963)
str[re]           1.340000   0.000000   1.340000 (  1.339704)
re.match str      2.040000   0.000000   2.040000 (  2.046464)
str.match re      2.180000   0.000000   2.180000 (  2.174691)

User · Answer

This is a simple benchmark   require  benchmark    test123      1    gt  4 Benchmark measure  1000000 times    test123      1        gt    0 610000   0 000000   0 610000    0 578133    test123   1     gt   1  Benchmark measure  1000000 times    test123   1         gt    0 718000   0 000000   0 718000    0 750010   irb main  019 0 gt   test123  match  1     gt    lt MatchData  1  gt  Benchmark measure  1000000 times    test123  match  1         gt    1 703000   0 000000   1 703000    1 578146    So    is faster but it depends what you want to have as a returned value  If you just want to check if the text contains a regex or not use

User · Answer

To complete Wiktor Stribizew and Dougui answers I would say that  regex  match   string   about as fast as  string  match   regex     Ruby 2 4 0  10 000 000  2 sec   2 4 0  gt  require  benchmark     gt  true  2 4 0  gt  Benchmark measure  10000000 times     CVE- 0-9  4 - 0-9  4     match   CVE-2018-1589          gt    lt Benchmark  Tms 0x005563da1b1c80  label      real 2 2060338060000504   cstime 0 0   cutime 0 0   stime 0 04000000000000001   utime 2 17   total 2 21 gt   2 4 0  gt  Benchmark measure  10000000 times    CVE-2018-1589  match    CVE- 0-9  4 - 0-9  4             gt    lt Benchmark  Tms 0x005563da139eb0  label      real 2 260814556000696   cstime 0 0   cutime 0 0   stime 0 010000000000000009   utime 2 2500000000000004   total 2 2600000000000007 gt     Ruby 2 6 2  100 000 000  20 sec   irb main  001 0 gt  require  benchmark    gt  true irb main  005 0 gt  Benchmark measure  100000000 times     CVE- 0-9  4 - 0-9  4     match   CVE-2018-1589         gt    lt Benchmark  Tms 0x0000562bc83e3768  label      real 24 60139879199778   cstime 0 0   cutime 0 0   stime 0 010000999999999996   utime 24 565644999999996   total 24 575645999999995 gt  irb main  004 0 gt  Benchmark measure  100000000 times    CVE-2018-1589  match    CVE- 0-9  4 - 0-9  4            gt    lt Benchmark  Tms 0x0000562bc846aee8  label      real 24 634255946999474   cstime 0 0   cutime 0 0   stime 0 010046   utime 24 598276   total 24 608321999999998 gt    Note  times varies  sometimes  regex  match   string   is faster and sometimes  string  match   regex    the differences maybe only due to the machine activity

User · Answer

What I am wondering is if there is any strange way to make this check even faster  maybe exploiting some strange method in Regexp or some weird construct     Regexp engines vary in how they implement searches  but  in general  anchor your patterns for speed  and avoid greedy matches  especially when searching long strings   The best thing to do  until you re familiar with how a particular engine works  is to do benchmarks and add remove anchors  try limiting searches  use wildcards vs  explicit matches  etc   The Fruity gem is very useful for quickly benchmarking things  because it s smart  Ruby s built-in Benchmark code is also useful  though you can write tests that fool you by not being careful   I ve used both in many answers here on Stack Overflow  so you can search through my answers and will see lots of little tricks and results to give you ideas of how to write faster code   The biggest thing to remember is  it s bad to prematurely optimize your code before you know where the slowdowns occur

User · Answer

What about re     str  case compare    Since it evaluates to true or false and has no need for storing matches  returning match index and that stuff  I wonder if it would be an even faster way of matching than        Ok  I tested this     is still faster  even if you have multiple capture groups  however it is faster than the other options   BTW  what good is freeze  I couldn t measure any performance boost from it

User · Answer

Depending on how complicated your regular expression is  you could possibly just use simple string slicing  I m not sure about the practicality of this for your application or whether or not it would actually offer any speed improvements    testsentence   stsen     gt   stsen    evaluates to true  testsentence   koala     gt  nil   evaluates to false

User · Answer

Starting with Ruby 2 4 0  you may use RegExp match    pattern match  string    Regexp match  is explicitly listed as a performance enhancement in the release notes for 2 4 0  as it avoids object allocations performed by other methods such as Regexp match and         Regexp match    Added Regexp match   which executes a regexp match without creating a back reference object and changing    to reduce object allocation

[ruby] Fastest way to check if a string matches a regexp in ruby?

Examples related to ruby

Examples related to regex

Examples related to performance