[regex] How to "inverse match" with regex?

I'm using RegexBuddy but I'm in trouble anyway with this thing :\

I'm processing line by line a file. I built a "line model" to match what I want.

Now i'd like to do an inverse match... i.e. I want to match lines where there is a string of 6 letters, but only if these six letters are not Andrea, how should I do that?


EDIT: I'll write the program that uses this regex, I don't know yet if in python or php, I'm doing this thing first to learn some regex :) There are different types of line, I wanted to use regex to select the type i'm interested in. Once I got these lines I've to apply an other filter just to do not match a known value, I need all the others, not that. The (?!not-wanted) is working pretty fine, thank you. :-)

I hope this clarifies the question :)

This question is related to regex inverse-match

The answer is


If you want to do this in RegexBuddy, there are two ways to get a list of all lines not matching a regex.

On the toolbar on the Test panel, set the test scope to "Line by line". When you do that, an item List All Lines without Matches will appear under the List All button on the same toolbar. (If you don't see the List All button, click the Match button in the main toolbar.)

On the GREP panel, you can turn on the "line-based" and the "invert results" checkboxes to get a list of non-matching lines in the files you're grepping through.


Negative lookahead assertion

(?!Andrea)

This is not exactly an inverted match, but it's the best you can directly do with regex. Not all platforms support them though.


What language are you using? The capabilities and syntax of the regex implementation matter for this.

You could use look-ahead. Using python as an example

import re

not_andrea = re.compile('(?!Andrea)\w{6}', re.IGNORECASE)

To break that down:

(?!Andrea) means 'match if the next 6 characters are not "Andrea"'; if so then

\w means a "word character" - alphanumeric characters. This is equivalent to the class [a-zA-Z0-9_]

\w{6} means exactly 6 word characters.

re.IGNORECASE means that you will exclude "Andrea", "andrea", "ANDREA" ...

Another way is to use your program logic - use all lines not matching Andrea and put them through a second regex to check for 6 characters. Or first check for at least 6 word characters, and then check that it does not match Andrea.



Updated with feedback from Alan Moore

In PCRE and similar variants, you can actually create a regex that matches any line not containing a value:

^(?:(?!Andrea).)*$

This is called a tempered greedy token. The downside is that it doesn't perform well.


In perl you can do

process($line) if ($line =~ !/Andrea/);


If you want to do this in RegexBuddy, there are two ways to get a list of all lines not matching a regex.

On the toolbar on the Test panel, set the test scope to "Line by line". When you do that, an item List All Lines without Matches will appear under the List All button on the same toolbar. (If you don't see the List All button, click the Match button in the main toolbar.)

On the GREP panel, you can turn on the "line-based" and the "invert results" checkboxes to get a list of non-matching lines in the files you're grepping through.


I just came up with this method which may be hardware intensive but it is working:

You can replace all characters which match the regex by an empty string.

This is a oneliner:

notMatched = re.sub(regex, "", string)

I used this because I was forced to use a very complex regex and couldn't figure out how to invert every part of it within a reasonable amount of time.

This will only return you the string result, not any match objects!


(?! is useful in practice. Although strictly speaking, looking ahead is not regular expression as defined mathematically.

You can write an invert regular expression manually.

Here is a program to calculate the result automatically. Its result is machine generated, which is usually much more complex than hand writing one. But the result works.