[regex] Negative regex for Perl string pattern match

I have this regex:

if($string =~ m/^(Clinton|[^Bush]|Reagan)/i)
  {print "$string\n"};

I want to match with Clinton and Reagan, but not Bush.

It's not working.

This question is related to regex perl

The answer is


If my understanding is correct then you want to match any line which has Clinton and Reagan, in any order, but not Bush. As suggested by Stuck, here is a version with lookahead assertions:

#!/usr/bin/perl

use strict;
use warnings;

my $regex = qr/
    (?=.*clinton)  
    (?!.*bush) 
    .*reagan       
    /ix;

while (<DATA>) {
    chomp;
    next unless (/$regex/);
    print $_, "\n";
}


__DATA__
shouldn't match - reagan came first, then clinton, finally bush
first match - first two: reagan and clinton
second match - first two reverse: clinton and reagan
shouldn't match - last two: clinton and bush
shouldn't match - reverse: bush and clinton
shouldn't match - and then came obama, along comes mary
shouldn't match - to clinton with perl

Results

first match - first two: reagan and clinton
second match - first two reverse: clinton and reagan

as desired it matches any line which has Reagan and Clinton in any order.

You may want to try reading how lookahead assertions work with examples at http://www252.pair.com/comdog/mastering_perl/Chapters/02.advanced_regular_expressions.html

they are very tasty :)


Your regex does not work because [] defines a character class, but what you want is a lookahead:

(?=) - Positive look ahead assertion foo(?=bar) matches foo when followed by bar
(?!) - Negative look ahead assertion foo(?!bar) matches foo when not followed by bar
(?<=) - Positive look behind assertion (?<=foo)bar matches bar when preceded by foo
(?<!) - Negative look behind assertion (?<!foo)bar matches bar when NOT preceded by foo
(?>) - Once-only subpatterns (?>\d+)bar Performance enhancing when bar not present
(?(x)) - Conditional subpatterns
(?(3)foo|fu)bar - Matches foo if 3rd subpattern has matched, fu if not
(?#) - Comment (?# Pattern does x y or z)

So try: (?!bush)


What's wrong with using two regexs (or three)? This makes your intentions more clear and may even improve your performance:

if ($string =~ /^(Clinton|Reagan)/i && $string !~ /Bush/i) { ... }

if (($string =~ /^Clinton/i || $string =~ /^Reagan/i)
        && $string !~ /Bush/i) {
    print "$string\n"
}

Your regex says the following:

/^         - if the line starts with
(          - start a capture group
Clinton|   - "Clinton" 
|          - or
[^Bush]    - Any single character except "B", "u", "s" or "h"
|          - or
Reagan)   - "Reagan". End capture group.
/i         - Make matches case-insensitive 

So, in other words, your middle part of the regex is screwing you up. As it is a "catch-all" kind of group, it will allow any line that does not begin with any of the upper or lower case letters in "Bush". For example, these lines would match your regex:

Our president, George Bush
In the news today, pigs can fly
012-3123 33

You either make a negative look-ahead, as suggested earlier, or you simply make two regexes:

if( ($string =~ m/^(Clinton|Reagan)/i) and
    ($string !~ m/^Bush/i) ) {
   print "$string\n";
}

As mirod has pointed out in the comments, the second check is quite unnecessary when using the caret (^) to match only beginning of lines, as lines that begin with "Clinton" or "Reagan" could never begin with "Bush".

However, it would be valid without the carets.