Regular expression to match a line that doesn t contain a word

Question

I know it s possible to match a word and then reverse the matches using other tools  e g  grep -v   However  is it possible to match lines that do not contain a specific word  e g  hede  using a regular expression    Input   hoho hihi haha hede   Code   grep   lt Regex for  doesn t contain hede  gt   input   Desired output   hoho hihi haha

User · Accepted Answer

The notion that regex doesn t support inverse matching is not entirely true  You can mimic this behavior by using negative look-arounds        hede        The regex above will match any string  or line without a line break  not containing the  sub string  hede   As mentioned  this is not something regex is  good  at  or should do   but still  it is possible    And if you need to match line break chars as well  use the DOT-ALL modifier  the trailing s in the following pattern          hede      s   or use it inline      s      hede          where the       are the regex delimiters  i e   not part of the pattern   If the DOT-ALL modifier is not available  you can mimic the same behavior with the character class   s S          hede   s S        Explanation  A string is just a list of n characters  Before  and after each character  there s an empty string  So a list of n characters will have n 1 empty strings  Consider the string  ABhedeCD         ----------------------------------------------------------  S     e1   A   e2   B   e3   h   e4   e   e5   d   e6   e   e7   C   e8   D   e9        ----------------------------------------------------------   index    0      1      2      3      4      5      6      7   where the e s are the empty strings  The regex    hede   looks ahead to see if there s no substring  hede  to be seen  and if that is the case  so something else is seen   then the    dot  will match any character except a line break  Look-arounds are also called zero-width-assertions because they don t consume any characters  They only assert validate something    So  in my example  every empty string is first validated to see if there s no  hede  up ahead  before a character is consumed by the    dot   The regex    hede   will do that only once  so it is wrapped in a group  and repeated zero or more times      hede      Finally  the start- and end-of-input are anchored to make sure the entire input is consumed       hede       As you can see  the input  ABhedeCD  will fail because on e3  the regex    hede  fails  there is  hede  up ahead

User · Answer

Answer        hede        Explanation    the beginning of the string    group and capture to  1  0 or more times  matching the most amount possible        look ahead to see if there is not     hede your string       end of look-ahead     any character except  n     end of  1    Note  because you are using a quantifier on this capture  only the LAST repetition of the captured pattern will be stored in  1    before an optional  n  and the end of the string

User · Answer

The TXR Language supports regex negation.

$ txr -c '@(repeat)
@{nothede /~hede/}
@(do (put-line nothede))
@(end)'  Input

A more complicated example: match all lines that start with a and end with z, but do not contain the substring hede:

$ txr -c '@(repeat)
@{nothede /a.*z&~.*hede.*/}
@(do (put-line nothede))
@(end)' -
az         <- echoed
az
abcz       <- echoed
abcz
abhederz   <- not echoed; contains hede
ahedez     <- not echoed; contains hede
ace        <- not echoed; does not end in z
ahedz      <- echoed
ahedz

Regex negation is not particularly useful on its own but when you also have intersection, things get interesting, since you have a full set of boolean set operations: you can express "the set which matches this, except for things which match that".

User · Answer

The below function will help you get your desired output   lt  PHP       function removePrepositions  text                 propositions array    bfor b i     bthe b i                  if  count  propositions   gt  0                     foreach  propositions as  exceptionPhrase                         text   preg replace  exceptionPhrase      trim  text                                    retval   trim  text                          return  retval            gt

User · Answer

FWIW  since regular languages  aka rational languages  are closed under complementation  it s always possible to find a regular expression  aka rational expression  that negates another expression  But not many tools implement this   Vcsn supports this operator  which it denotes  c   postfix    You first define the type of your expressions  labels are letter  lal char  to pick from a to z for instance  defining the alphabet when working with complementation is  of course  very important   and the  value  computed for each word is just a Boolean  true the word is accepted  false  rejected   In Python   In  5   import vcsn         c   vcsn context  lal char a-z   b           c Out 5    a b c d e f g h i j k l m n o p q r s t u v w x y z       then you enter your expression   In  6   e   c expression   hede  c     e Out 6    hede  c   convert this expression to an automaton   In  7   a   e automaton    a     finally  convert this automaton back to a simple expression   In  8   print a expression             e h  e e  e d      h  h   e  e   d  d   e  e              where   is usually denoted     e denotes the empty word  and     is usually written    any character    So  with a bit of rewriting    h ed       h  h   e  e   d  d   e  e          You can see this example here  and try Vcsn online there

User · Answer

I wanted to add another example for if you are trying to match an entire line that contains string X, but does not also contain string Y.

For example, let's say we want to check if our URL / string contains "tasty-treats", so long as it does not also contain "chocolate" anywhere.

This regex pattern would work (works in JavaScript too)

^(?=.*?tasty-treats)((?!chocolate).)*$

(global, multiline flags in example)

Interactive Example: https://regexr.com/53gv4

Matches

(These urls contain "tasty-treats" and also do not contain "chocolate")

example.com/tasty-treats/strawberry-ice-cream
example.com/desserts/tasty-treats/banana-pudding
example.com/tasty-treats-overview

Does Not Match

(These urls contain "chocolate" somewhere - so they won't match even though they contain "tasty-treats")

example.com/tasty-treats/chocolate-cake
example.com/home-cooking/oven-roasted-chicken
example.com/tasty-treats/banana-chocolate-fudge
example.com/desserts/chocolate/tasty-treats
example.com/chocolate/tasty-treats/desserts

User · Answer

Through PCRE verb   SKIP   F    hede   SKIP   F         This would completely skips the line which contains the exact string hede and matches all the remaining lines   DEMO  Execution of the parts   Let us consider the above regex by splitting it into two parts    Part before the   symbol  Part shouldn t be matched     hede   SKIP   F   Part after the   symbol  Part should be matched            PART 1   Regex engine will start its execution from the first part    hede   SKIP   F    Explanation      Asserts that we are at the start  hede Matches the string hede   Asserts that we are at the line end    So the line which contains the string hede would be matched  Once the regex engine sees the following   SKIP   F   Note  You could write   F  as   FAIL   verb  it skips and make the match to fail    called alteration or logical OR operator added next to the PCRE verb which inturn matches all the boundaries exists between each and every character on all the lines except the line contains the exact string hede  See the demo here  That is  it tries to match the characters from the remaining string  Now the regex in the second part would be executed   PART 2         Explanation       Asserts that we are at the start  ie  it matches all the line starts except the one in the hede line  See the demo here     In the Multiline mode    would match any character except newline or carriage return characters  And   would repeat the previous character zero or more times  So    would match the whole line  See the demo here   Hey why you added    instead of       Because    would match a blank line but    won t match a blank  We want to match all the lines except hede   there may be a possibility of blank lines also in the input   so you must use    instead of         would repeat the previous character one or more times  See    matches a blank line here    End of the line anchor is not necessary here

User · Answer

If you want to match a character to negate a word similar to negate character class:

For example, a string:

<?
$str="aaa        bbb4      aaa     bbb7";
?>

Do not use:

<?
preg_match('/aaa[^bbb]+?bbb7/s', $str, $matches);
?>

Use:

<?
preg_match('/aaa(?:(?!bbb).)+?bbb7/s', $str, $matches);
?>

Notice "(?!bbb)." is neither lookbehind nor lookahead, it's lookcurrent, for example:

"(?=abc)abcde", "(?!abc)abcde"

User · Answer

Maybe you ll find this on Google while trying to write a regex that is able to match segments of a line  as opposed to entire lines  which do not contain a substring  Tooke me a while to figure out  so I ll share   Given a string     lt span class  good  gt bar lt  span gt  lt span class  bad  gt foo lt  span gt  lt span class  ugly  gt baz lt  span gt    I want to match  lt span gt  tags which do not contain the substring  bad      lt span      bad      gt  will match  lt span class   good   gt  and  lt span class   ugly   gt    Notice that there are two sets  layers  of parentheses    The innermost one is for the negative lookahead  it is not a capture group  The outermost was interpreted by Ruby as capture group but we don t want it to be a capture group  so I added    at it s beginning and it is no longer interpreted as a capture group    Demo in Ruby   s     lt span class  good  gt bar lt  span gt  lt span class  bad  gt foo lt  span gt  lt span class  ugly  gt baz lt  span gt   s scan   lt span      bad      gt        gt     lt span class   good   gt      lt span class   ugly   gt

User · Answer

Answer        hede        Explanation    the beginning of the string    group and capture to  1  0 or more times  matching the most amount possible        look ahead to see if there is not     hede your string       end of look-ahead     any character except  n     end of  1    Note  because you are using a quantifier on this capture  only the LAST repetition of the captured pattern will be stored in  1    before an optional  n  and the end of the string

User · Answer

The OP did not specify or Tag the post to indicate the context  programming language  editor  tool  the Regex will be used within     For me  I sometimes need to do this while editing a file using Textpad     Textpad supports some Regex  but does not support lookahead or lookbehind  so it takes a few steps     If I am looking to retain all lines that Do NOT contain the string hede  I would do it like this      1  Search replace the entire file to add a unique  Tag  to the beginning of each line containing any text          Search string            Replace string  lt   -unique-   gt  1       Replace-all        2  Delete all lines that contain the string hede  replacement string is empty             Search string  lt   -unique-   gt   hede   n       Replace string  lt nothing gt        Replace-all          3  At this point  all remaining lines Do NOT contain the string hede  Remove the unique  Tag  from all lines  replacement string is empty             Search string  lt   -unique-   gt      Replace string  lt nothing gt        Replace-all     Now you have the original text with all lines containing the string hede removed      If I am looking to Do Something Else to only lines that Do NOT contain the string hede  I would do it like this      1  Search replace the entire file to add a unique  Tag  to the beginning of each line containing any text          Search string            Replace string  lt   -unique-   gt  1       Replace-all        2  For all lines that contain the string hede  remove the unique  Tag             Search string  lt   -unique-   gt    hede      Replace string  1       Replace-all          3  At this point  all lines that begin with the unique  Tag   Do NOT contain the string hede  I can now do my Something Else to only those lines         4  When I am done  I remove the unique  Tag  from all lines  replacement string is empty             Search string  lt   -unique-   gt      Replace string  lt nothing gt        Replace-all

User · Answer

The given answers are perfectly fine  just an academic point   Regular Expressions in the meaning of theoretical computer sciences ARE NOT ABLE do it like this  For them it had to look something like this       h       h   e          he   h          heh   e          hehe        This only does a FULL match  Doing it for sub-matches would even be more awkward

User · Answer

I wanted to add another example for if you are trying to match an entire line that contains string X, but does not also contain string Y.

For example, let's say we want to check if our URL / string contains "tasty-treats", so long as it does not also contain "chocolate" anywhere.

This regex pattern would work (works in JavaScript too)

^(?=.*?tasty-treats)((?!chocolate).)*$

(global, multiline flags in example)

Interactive Example: https://regexr.com/53gv4

Matches

(These urls contain "tasty-treats" and also do not contain "chocolate")

example.com/tasty-treats/strawberry-ice-cream
example.com/desserts/tasty-treats/banana-pudding
example.com/tasty-treats-overview

Does Not Match

(These urls contain "chocolate" somewhere - so they won't match even though they contain "tasty-treats")

example.com/tasty-treats/chocolate-cake
example.com/home-cooking/oven-roasted-chicken
example.com/tasty-treats/banana-chocolate-fudge
example.com/desserts/chocolate/tasty-treats
example.com/chocolate/tasty-treats/desserts

User · Answer

with this  you avoid to test a lookahead on each positions          h   h     ede         equivalent to  for  net        gt      h   h    ede        Old answer        gt   h   h    ede

User · Answer

Here s a good explanation of why it s not easy to negate an arbitrary regex  I have to agree with the other answers  though  if this is anything other than a hypothetical question  then a regex is not the right choice here

User · Answer

If you want the regex test to only fail if the entire string matches  the following will work       hede       e g  -- If you want to allow all values except  foo   i e   foofoo    barfoo   and  foobar  will pass  but  foo  will fail   use      foo      Of course  if you re checking for exact equality  a better general solution in this case is to check for string equality  i e    myStr      foo    You could even put the negation outside the test if you need any regex features  here  case insensitivity and range matching        a-f oo  i test myStr    The regex solution at the top of this answer may be helpful  however  in situations where a positive regex test is required  perhaps by an API

User · Answer

Here s a good explanation of why it s not easy to negate an arbitrary regex  I have to agree with the other answers  though  if this is anything other than a hypothetical question  then a regex is not the right choice here

User · Answer

With ConyEdit  you can use the command line cc gl   hede  to get lines that do not contain the regex matching  or use the command line cc dl  hede  to delete lines that contain the regex matching  They have the same result

User · Answer

An  in my opinon  more readable variant of the top answer          hede    Basically   match at the beginning of the line if and only if it does not have  hede  in it  - so the requirement translated almost directly into regex   Of course  it s possible to have multiple failure requirements          hede hodo hada     Details  The   anchor ensures the regex engine doesn t retry the match at every location in the string  which would match every string   The   anchor in the beginning is meant to represent the beginning of the line  The grep tool matches each line one at a time  in contexts where you re working with a multiline string  you can use the  m  flag          hede  m   JavaScript syntax   or    m       hede    Inline flag

User · Answer

Aforementioned       hede     is great because it can be anchored          hede                      A line without hede  foo      hede    bar             foo followed by bar  without hede between them   But the following would suffice in this case         hede                       A line without hede   This simplification is ready to have  AND  clauses added         hede      foo      bar      A line with foo and bar  but without hede       hede      foo   bar         Same

User · Answer

The OP did not specify or Tag the post to indicate the context  programming language  editor  tool  the Regex will be used within     For me  I sometimes need to do this while editing a file using Textpad     Textpad supports some Regex  but does not support lookahead or lookbehind  so it takes a few steps     If I am looking to retain all lines that Do NOT contain the string hede  I would do it like this      1  Search replace the entire file to add a unique  Tag  to the beginning of each line containing any text          Search string            Replace string  lt   -unique-   gt  1       Replace-all        2  Delete all lines that contain the string hede  replacement string is empty             Search string  lt   -unique-   gt   hede   n       Replace string  lt nothing gt        Replace-all          3  At this point  all remaining lines Do NOT contain the string hede  Remove the unique  Tag  from all lines  replacement string is empty             Search string  lt   -unique-   gt      Replace string  lt nothing gt        Replace-all     Now you have the original text with all lines containing the string hede removed      If I am looking to Do Something Else to only lines that Do NOT contain the string hede  I would do it like this      1  Search replace the entire file to add a unique  Tag  to the beginning of each line containing any text          Search string            Replace string  lt   -unique-   gt  1       Replace-all        2  For all lines that contain the string hede  remove the unique  Tag             Search string  lt   -unique-   gt    hede      Replace string  1       Replace-all          3  At this point  all lines that begin with the unique  Tag   Do NOT contain the string hede  I can now do my Something Else to only those lines         4  When I am done  I remove the unique  Tag  from all lines  replacement string is empty             Search string  lt   -unique-   gt      Replace string  lt nothing gt        Replace-all

User · Answer

If you want to match a character to negate a word similar to negate character class:

For example, a string:

<?
$str="aaa        bbb4      aaa     bbb7";
?>

Do not use:

<?
preg_match('/aaa[^bbb]+?bbb7/s', $str, $matches);
?>

Use:

<?
preg_match('/aaa(?:(?!bbb).)+?bbb7/s', $str, $matches);
?>

Notice "(?!bbb)." is neither lookbehind nor lookahead, it's lookcurrent, for example:

"(?=abc)abcde", "(?!abc)abcde"

User · Answer

Here s how I d do it      h   h   ede   h        Accurate and more efficient than the other answers  It implements Friedl s  unrolling-the-loop  efficiency technique and requires much less backtracking

User · Answer

Note that the solution to does not start with    hede          hede       is generally much more efficient than the solution to does not contain    hede           hede        The former checks for    hede    only at the input string   s first position  rather than at every position

User · Answer

An  in my opinon  more readable variant of the top answer          hede    Basically   match at the beginning of the line if and only if it does not have  hede  in it  - so the requirement translated almost directly into regex   Of course  it s possible to have multiple failure requirements          hede hodo hada     Details  The   anchor ensures the regex engine doesn t retry the match at every location in the string  which would match every string   The   anchor in the beginning is meant to represent the beginning of the line  The grep tool matches each line one at a time  in contexts where you re working with a multiline string  you can use the  m  flag          hede  m   JavaScript syntax   or    m       hede    Inline flag

User · Answer

If you re just using it for grep  you can use grep -v hede to get all lines which do not contain hede   ETA Oh  rereading the question  grep -v is probably what you meant by  tools options

User · Answer

The given answers are perfectly fine  just an academic point   Regular Expressions in the meaning of theoretical computer sciences ARE NOT ABLE do it like this  For them it had to look something like this       h       h   e          he   h          heh   e          hehe        This only does a FULL match  Doing it for sub-matches would even be more awkward

User · Answer

If you re just using it for grep  you can use grep -v hede to get all lines which do not contain hede   ETA Oh  rereading the question  grep -v is probably what you meant by  tools options

User · Answer

hede      is an elegant solution  except since it consumes characters you won t be able to combine it with other criteria  For instance  say you wanted to check for the non-presence of  hede  and the presence of  haha   This solution would work because it won t consume characters          bhede b       bhaha b

User · Answer

Here s a good explanation of why it s not easy to negate an arbitrary regex  I have to agree with the other answers  though  if this is anything other than a hypothetical question  then a regex is not the right choice here

User · Answer

How to use PCRE s backtracking control verbs to match a line not containing a word  Here s a method that I haven t seen used before      hede  COMMIT       How it works  First  it tries to find  hede  somewhere in the line  If successful  at this point    COMMIT  tells the engine to  not only not backtrack in the event of a failure  but also not to attempt any further matching in that case  Then  we try to match something that cannot possibly match  in this case       If a line does not contain  hede  then the second alternative  an empty subpattern  successfully matches the subject string   This method is no more efficient than a negative lookahead  but I figured I d just throw it on here in case someone finds it nifty and finds a use for it for other  more interesting applications

User · Answer

With negative lookahead  regular expression can match something not contains specific pattern  This is answered and explained by Bart Kiers  Great explanation   However  with Bart Kiers  answer  the lookahead part will test 1 to 4 characters ahead while matching any single character  We can avoid this and let the lookahead part check out the whole text  ensure there is no  hede   and then the normal part      can eat the whole text all at one time   Here is the improved regex           hede        Note the      lazy quantifier in the negative lookahead part is optional  you can use     greedy quantifier instead  depending on your data  if  hede  does present and in the beginning half of the text  the lazy quantifier can be faster  otherwise  the greedy quantifier be faster  However if  hede  does not present  both would be equal slow   Here is the demo code   For more information about lookahead  please check out the great article  Mastering Lookahead and Lookbehind   Also  please check out RegexGen js  a JavaScript Regular Expression Generator that helps to construct complex regular expressions  With RegexGen js  you can construct the regex in a more readable way   var     regexGen   var regex            startOfLine                       anything   notContains           match anything that not contains            anything   lazy     hede       zero or more chars that followed by  hede                                            i e   anything contains  hede                endOfLine

User · Answer

With negative lookahead  regular expression can match something not contains specific pattern  This is answered and explained by Bart Kiers  Great explanation   However  with Bart Kiers  answer  the lookahead part will test 1 to 4 characters ahead while matching any single character  We can avoid this and let the lookahead part check out the whole text  ensure there is no  hede   and then the normal part      can eat the whole text all at one time   Here is the improved regex           hede        Note the      lazy quantifier in the negative lookahead part is optional  you can use     greedy quantifier instead  depending on your data  if  hede  does present and in the beginning half of the text  the lazy quantifier can be faster  otherwise  the greedy quantifier be faster  However if  hede  does not present  both would be equal slow   Here is the demo code   For more information about lookahead  please check out the great article  Mastering Lookahead and Lookbehind   Also  please check out RegexGen js  a JavaScript Regular Expression Generator that helps to construct complex regular expressions  With RegexGen js  you can construct the regex in a more readable way   var     regexGen   var regex            startOfLine                       anything   notContains           match anything that not contains            anything   lazy     hede       zero or more chars that followed by  hede                                            i e   anything contains  hede                endOfLine

User · Answer

Not regex  but I ve found it logical and useful to use serial greps with pipe to eliminate noise   eg   search an apache config file without all the comments-  grep -v       opt lampp etc httpd conf        this gives all the non-comment lines   and  grep -v       opt lampp etc httpd conf    grep -i dir   The logic of serial grep s is  not a comment  and  matches dir

User · Answer

If you re just using it for grep  you can use grep -v hede to get all lines which do not contain hede   ETA Oh  rereading the question  grep -v is probably what you meant by  tools options

User · Answer

with this  you avoid to test a lookahead on each positions          h   h     ede         equivalent to  for  net        gt      h   h    ede        Old answer        gt   h   h    ede

User · Answer

The below function will help you get your desired output   lt  PHP       function removePrepositions  text                 propositions array    bfor b i     bthe b i                  if  count  propositions   gt  0                     foreach  propositions as  exceptionPhrase                         text   preg replace  exceptionPhrase      trim  text                                    retval   trim  text                          return  retval            gt

User · Answer

Another option is that to add a positive look-ahead and check if hede is anywhere in the input line, then we would negate that, with an expression similar to:

^(?!(?=.*\bhede\b)).*$

with word boundaries.

The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

RegEx Circuit

jex.im visualizes regular expressions:

User · Answer

If you want the regex test to only fail if the entire string matches  the following will work       hede       e g  -- If you want to allow all values except  foo   i e   foofoo    barfoo   and  foobar  will pass  but  foo  will fail   use      foo      Of course  if you re checking for exact equality  a better general solution in this case is to check for string equality  i e    myStr      foo    You could even put the negation outside the test if you need any regex features  here  case insensitivity and range matching        a-f oo  i test myStr    The regex solution at the top of this answer may be helpful  however  in situations where a positive regex test is required  perhaps by an API

User · Answer

The TXR Language supports regex negation.

$ txr -c '@(repeat)
@{nothede /~hede/}
@(do (put-line nothede))
@(end)'  Input

A more complicated example: match all lines that start with a and end with z, but do not contain the substring hede:

$ txr -c '@(repeat)
@{nothede /a.*z&~.*hede.*/}
@(do (put-line nothede))
@(end)' -
az         <- echoed
az
abcz       <- echoed
abcz
abhederz   <- not echoed; contains hede
ahedez     <- not echoed; contains hede
ace        <- not echoed; does not end in z
ahedz      <- echoed
ahedz

Regex negation is not particularly useful on its own but when you also have intersection, things get interesting, since you have a full set of boolean set operations: you can express "the set which matches this, except for things which match that".

User · Answer

Here s a good explanation of why it s not easy to negate an arbitrary regex  I have to agree with the other answers  though  if this is anything other than a hypothetical question  then a regex is not the right choice here

User · Answer

It may be more maintainable to two regexes in your code, one to do the first match, and then if it matches run the second regex to check for outlier cases you wish to block for example ^.*(hede).* then have appropriate logic in your code.

OK, I admit this is not really an answer to the posted question posted and it may also use slightly more processing than a single regex. But for developers who came here looking for a fast emergency fix for an outlier case then this solution should not be overlooked.

User · Answer

How to use PCRE s backtracking control verbs to match a line not containing a word  Here s a method that I haven t seen used before      hede  COMMIT       How it works  First  it tries to find  hede  somewhere in the line  If successful  at this point    COMMIT  tells the engine to  not only not backtrack in the event of a failure  but also not to attempt any further matching in that case  Then  we try to match something that cannot possibly match  in this case       If a line does not contain  hede  then the second alternative  an empty subpattern  successfully matches the subject string   This method is no more efficient than a negative lookahead  but I figured I d just throw it on here in case someone finds it nifty and finds a use for it for other  more interesting applications

User · Answer

Since no one else has given a direct answer to the question that was asked, I'll do it.

The answer is that with POSIX grep, it's impossible to literally satisfy this request:

grep "<Regex for 'doesn't contain hede'>" input

The reason is that POSIX grep is only required to work with Basic Regular Expressions, which are simply not powerful enough for accomplishing that task (they are not capable of parsing all regular languages, because of lack of alternation).

However, GNU grep implements extensions that allow it. In particular, \| is the alternation operator in GNU's implementation of BREs. If your regular expression engine supports alternation, parentheses and the Kleene star, and is able to anchor to the beginning and end of the string, that's all you need for this approach. Note however that negative sets [^ ... ] are very convenient in addition to those, because otherwise, you need to replace them with an expression of the form (a|b|c| ... ) that lists every character that is not in the set, which is extremely tedious and overly long, even more so if the whole character set is Unicode.

Thanks to formal language theory, we get to see how such an expression looks like. With GNU grep, the answer would be something like:

grep "^\([^h]\|h\(h\|eh\|edh\)*\([^eh]\|e[^dh]\|ed[^eh]\)\)*\(\|h\(h\|eh\|edh\)*\(\|e\|ed\)\)$" input

(found with Grail and some further optimizations made by hand).

You can also use a tool that implements Extended Regular Expressions, like egrep, to get rid of the backslashes:

egrep "^([^h]|h(h|eh|edh)*([^eh]|e[^dh]|ed[^eh]))*(|h(h|eh|edh)*(|e|ed))$" input

Here's a script to test it (note it generates a file testinput.txt in the current directory). Several of the expressions presented fail this test.

#!/bin/bash
REGEX="^\([^h]\|h\(h\|eh\|edh\)*\([^eh]\|e[^dh]\|ed[^eh]\)\)*\(\|h\(h\|eh\|edh\)*\(\|e\|ed\)\)$"

# First four lines as in OP's testcase.
cat > testinput.txt <<EOF
hoho
hihi
haha
hede

h
he
ah
head
ahead
ahed
aheda
ahede
hhede
hehede
hedhede
hehehehehehedehehe
hedecidedthat
EOF
diff -s -u <(grep -v hede testinput.txt) <(grep "$REGEX" testinput.txt)

In my system it prints:

Files /dev/fd/63 and /dev/fd/62 are identical

as expected.

For those interested in the details, the technique employed is to convert the regular expression that matches the word into a finite automaton, then invert the automaton by changing every acceptance state to non-acceptance and vice versa, and then converting the resulting FA back to a regular expression.

As everyone has noted, if your regular expression engine supports negative lookahead, the regular expression is much simpler. For example, with GNU grep:

grep -P '^((?!hede).)*$' input

However, this approach has the disadvantage that it requires a backtracking regular expression engine. This makes it unsuitable in installations that are using secure regular expression engines like RE2, which is one reason to prefer the generated approach in some circumstances.

Using Kendall Hopkins' excellent FormalTheory library, written in PHP, which provides a functionality similar to Grail, and a simplifier written by myself, I've been able to write an online generator of negative regular expressions given an input phrase (only alphanumeric and space characters currently supported): http://www.formauri.es/personal/pgimeno/misc/non-match-regex/

For hede it outputs:

^([^h]|h(h|e(h|dh))*([^eh]|e([^dh]|d[^eh])))*(h(h|e(h|dh))*(ed?)?)?$

which is equivalent to the above.

User · Answer

Here s how I d do it      h   h   ede   h        Accurate and more efficient than the other answers  It implements Friedl s  unrolling-the-loop  efficiency technique and requires much less backtracking

User · Answer

If you re just using it for grep  you can use grep -v hede to get all lines which do not contain hede   ETA Oh  rereading the question  grep -v is probably what you meant by  tools options

User · Answer

FWIW  since regular languages  aka rational languages  are closed under complementation  it s always possible to find a regular expression  aka rational expression  that negates another expression  But not many tools implement this   Vcsn supports this operator  which it denotes  c   postfix    You first define the type of your expressions  labels are letter  lal char  to pick from a to z for instance  defining the alphabet when working with complementation is  of course  very important   and the  value  computed for each word is just a Boolean  true the word is accepted  false  rejected   In Python   In  5   import vcsn         c   vcsn context  lal char a-z   b           c Out 5    a b c d e f g h i j k l m n o p q r s t u v w x y z       then you enter your expression   In  6   e   c expression   hede  c     e Out 6    hede  c   convert this expression to an automaton   In  7   a   e automaton    a     finally  convert this automaton back to a simple expression   In  8   print a expression             e h  e e  e d      h  h   e  e   d  d   e  e              where   is usually denoted     e denotes the empty word  and     is usually written    any character    So  with a bit of rewriting    h ed       h  h   e  e   d  d   e  e          You can see this example here  and try Vcsn online there

User · Answer

hede      is an elegant solution  except since it consumes characters you won t be able to combine it with other criteria  For instance  say you wanted to check for the non-presence of  hede  and the presence of  haha   This solution would work because it won t consume characters          bhede b       bhaha b

User · Answer

Note that the solution to does not start with    hede          hede       is generally much more efficient than the solution to does not contain    hede           hede        The former checks for    hede    only at the input string   s first position  rather than at every position

User · Answer

Through PCRE verb   SKIP   F    hede   SKIP   F         This would completely skips the line which contains the exact string hede and matches all the remaining lines   DEMO  Execution of the parts   Let us consider the above regex by splitting it into two parts    Part before the   symbol  Part shouldn t be matched     hede   SKIP   F   Part after the   symbol  Part should be matched            PART 1   Regex engine will start its execution from the first part    hede   SKIP   F    Explanation      Asserts that we are at the start  hede Matches the string hede   Asserts that we are at the line end    So the line which contains the string hede would be matched  Once the regex engine sees the following   SKIP   F   Note  You could write   F  as   FAIL   verb  it skips and make the match to fail    called alteration or logical OR operator added next to the PCRE verb which inturn matches all the boundaries exists between each and every character on all the lines except the line contains the exact string hede  See the demo here  That is  it tries to match the characters from the remaining string  Now the regex in the second part would be executed   PART 2         Explanation       Asserts that we are at the start  ie  it matches all the line starts except the one in the hede line  See the demo here     In the Multiline mode    would match any character except newline or carriage return characters  And   would repeat the previous character zero or more times  So    would match the whole line  See the demo here   Hey why you added    instead of       Because    would match a blank line but    won t match a blank  We want to match all the lines except hede   there may be a possibility of blank lines also in the input   so you must use    instead of         would repeat the previous character one or more times  See    matches a blank line here    End of the line anchor is not necessary here

User · Answer

Since the introduction of ruby-2 4 1  we can use the new Absent Operator in Ruby   s Regular Expressions  from the official doc     abc  matches       ab    aab    cccc   etc  It doesn t match   abc    aabc    ccccabc   etc    Thus  in your case     hede   does the job for you  2 4 1  016  gt    hoho    hihi    haha    hede   select  s       hede    match s      gt    hoho    hihi    haha

User · Answer

Not regex  but I ve found it logical and useful to use serial greps with pipe to eliminate noise   eg   search an apache config file without all the comments-  grep -v       opt lampp etc httpd conf        this gives all the non-comment lines   and  grep -v       opt lampp etc httpd conf    grep -i dir   The logic of serial grep s is  not a comment  and  matches dir

User · Answer

Maybe you ll find this on Google while trying to write a regex that is able to match segments of a line  as opposed to entire lines  which do not contain a substring  Tooke me a while to figure out  so I ll share   Given a string     lt span class  good  gt bar lt  span gt  lt span class  bad  gt foo lt  span gt  lt span class  ugly  gt baz lt  span gt    I want to match  lt span gt  tags which do not contain the substring  bad      lt span      bad      gt  will match  lt span class   good   gt  and  lt span class   ugly   gt    Notice that there are two sets  layers  of parentheses    The innermost one is for the negative lookahead  it is not a capture group  The outermost was interpreted by Ruby as capture group but we don t want it to be a capture group  so I added    at it s beginning and it is no longer interpreted as a capture group    Demo in Ruby   s     lt span class  good  gt bar lt  span gt  lt span class  bad  gt foo lt  span gt  lt span class  ugly  gt baz lt  span gt   s scan   lt span      bad      gt        gt     lt span class   good   gt      lt span class   ugly   gt

User · Answer

As long as you are dealing with lines  simply mark the negative matches and target the rest   In fact  I use this trick with sed because      hede      looks not supported by it   For the desired output   Mark the negative match   e g  lines with hede   using a character not included in the whole text at all  An emoji could probably be a good choice for this purpose   s    hede   1 g  Target the rest  the unmarked strings  e g  lines without hede   Suppose you want to keep only the target and delete the rest  as you want    s      g    For a better understanding  Suppose you want to delete the target    Mark the negative match   e g  lines with hede   using a character not included in the whole text at all  An emoji could probably be a good choice for this purpose   s    hede   1 g  Target the rest  the unmarked strings  e g  lines without hede   Suppose you want to delete the target   s         g  Remove the mark   s   g

User · Answer

It may be more maintainable to two regexes in your code, one to do the first match, and then if it matches run the second regex to check for outlier cases you wish to block for example ^.*(hede).* then have appropriate logic in your code.

OK, I admit this is not really an answer to the posted question posted and it may also use slightly more processing than a single regex. But for developers who came here looking for a fast emergency fix for an outlier case then this solution should not be overlooked.

User · Answer

Benchmarks

I decided to evaluate some of the presented Options and compare their performance, as well as use some new Features. Benchmarking on .NET Regex Engine: http://regexhero.net/tester/

Benchmark Text:

The first 7 lines should not match, since they contain the searched Expression, while the lower 7 lines should match!

Regex Hero is a real-time online Silverlight Regular Expression Tester.
XRegex Hero is a real-time online Silverlight Regular Expression Tester.
Regex HeroRegex HeroRegex HeroRegex HeroRegex Hero is a real-time online Silverlight Regular Expression Tester.
Regex Her Regex Her Regex Her Regex Her Regex Her Regex Her Regex Hero is a real-time online Silverlight Regular Expression Tester.
Regex Her is a real-time online Silverlight Regular Expression Tester.Regex Hero
egex Hero egex Hero egex Hero egex Hero egex Hero egex Hero Regex Hero is a real-time online Silverlight Regular Expression Tester.
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRegex Hero is a real-time online Silverlight Regular Expression Tester.

Regex Her
egex Hero
egex Hero is a real-time online Silverlight Regular Expression Tester.
Regex Her is a real-time online Silverlight Regular Expression Tester.
Regex Her Regex Her Regex Her Regex Her Regex Her Regex Her is a real-time online Silverlight Regular Expression Tester.
Nobody is a real-time online Silverlight Regular Expression Tester.
Regex Her o egex Hero Regex  Hero Reg ex Hero is a real-time online Silverlight Regular Expression Tester.

Results:

Results are Iterations per second as the median of 3 runs - Bigger Number = Better

01: ^((?!Regex Hero).)*$                    3.914   // Accepted Answer
02: ^(?:(?!Regex Hero).)*$                  5.034   // With Non-Capturing group
03: ^(?>[^R]+|R(?!egex Hero))*$             6.137   // Lookahead only on the right first letter
04: ^(?>(?:.*?Regex Hero)?)^.*$             7.426   // Match the word and check if you're still at linestart
05: ^(?(?=.*?Regex Hero)(?#fail)|.*)$       7.371   // Logic Branch: Find Regex Hero? match nothing, else anything

P1: ^(?(?=.*?Regex Hero)(*FAIL)|(*ACCEPT))  ?????   // Logic Branch in Perl - Quick FAIL
P2: .*?Regex Hero(*COMMIT)(*FAIL)|(*ACCEPT) ?????   // Direct COMMIT & FAIL in Perl

Since .NET doesn't support action Verbs (*FAIL, etc.) I couldn't test the solutions P1 and P2.

Summary:

I tried to test most proposed solutions, some Optimizations are possible for certain words. For Example if the First two letters of the search string are not the Same, answer 03 can be expanded to ^(?>[^R]+|R+(?!egex Hero))*$ resulting in a small performance gain.

But the overall most readable and performance-wise fastest solution seems to be 05 using a conditional statement or 04 with the possesive quantifier. I think the Perl solutions should be even faster and more easily readable.

User · Answer

Aforementioned       hede     is great because it can be anchored          hede                      A line without hede  foo      hede    bar             foo followed by bar  without hede between them   But the following would suffice in this case         hede                       A line without hede   This simplification is ready to have  AND  clauses added         hede      foo      bar      A line with foo and bar  but without hede       hede      foo   bar         Same

User · Answer

A simpler solution is to use the not operator !

Your if statement will need to match "contains" and not match "excludes".

var contains = /abc/;
var excludes =/hede/;

if(string.match(contains) && !(string.match(excludes))){  //proceed...

I believe the designers of RegEx anticipated the use of not operators.

User · Answer

As long as you are dealing with lines  simply mark the negative matches and target the rest   In fact  I use this trick with sed because      hede      looks not supported by it   For the desired output   Mark the negative match   e g  lines with hede   using a character not included in the whole text at all  An emoji could probably be a good choice for this purpose   s    hede   1 g  Target the rest  the unmarked strings  e g  lines without hede   Suppose you want to keep only the target and delete the rest  as you want    s      g    For a better understanding  Suppose you want to delete the target    Mark the negative match   e g  lines with hede   using a character not included in the whole text at all  An emoji could probably be a good choice for this purpose   s    hede   1 g  Target the rest  the unmarked strings  e g  lines without hede   Suppose you want to delete the target   s         g  Remove the mark   s   g

User · Answer

Since no one else has given a direct answer to the question that was asked, I'll do it.

The answer is that with POSIX grep, it's impossible to literally satisfy this request:

grep "<Regex for 'doesn't contain hede'>" input

The reason is that POSIX grep is only required to work with Basic Regular Expressions, which are simply not powerful enough for accomplishing that task (they are not capable of parsing all regular languages, because of lack of alternation).

However, GNU grep implements extensions that allow it. In particular, \| is the alternation operator in GNU's implementation of BREs. If your regular expression engine supports alternation, parentheses and the Kleene star, and is able to anchor to the beginning and end of the string, that's all you need for this approach. Note however that negative sets [^ ... ] are very convenient in addition to those, because otherwise, you need to replace them with an expression of the form (a|b|c| ... ) that lists every character that is not in the set, which is extremely tedious and overly long, even more so if the whole character set is Unicode.

Thanks to formal language theory, we get to see how such an expression looks like. With GNU grep, the answer would be something like:

grep "^\([^h]\|h\(h\|eh\|edh\)*\([^eh]\|e[^dh]\|ed[^eh]\)\)*\(\|h\(h\|eh\|edh\)*\(\|e\|ed\)\)$" input

(found with Grail and some further optimizations made by hand).

You can also use a tool that implements Extended Regular Expressions, like egrep, to get rid of the backslashes:

egrep "^([^h]|h(h|eh|edh)*([^eh]|e[^dh]|ed[^eh]))*(|h(h|eh|edh)*(|e|ed))$" input

Here's a script to test it (note it generates a file testinput.txt in the current directory). Several of the expressions presented fail this test.

#!/bin/bash
REGEX="^\([^h]\|h\(h\|eh\|edh\)*\([^eh]\|e[^dh]\|ed[^eh]\)\)*\(\|h\(h\|eh\|edh\)*\(\|e\|ed\)\)$"

# First four lines as in OP's testcase.
cat > testinput.txt <<EOF
hoho
hihi
haha
hede

h
he
ah
head
ahead
ahed
aheda
ahede
hhede
hehede
hedhede
hehehehehehedehehe
hedecidedthat
EOF
diff -s -u <(grep -v hede testinput.txt) <(grep "$REGEX" testinput.txt)

In my system it prints:

Files /dev/fd/63 and /dev/fd/62 are identical

as expected.

For those interested in the details, the technique employed is to convert the regular expression that matches the word into a finite automaton, then invert the automaton by changing every acceptance state to non-acceptance and vice versa, and then converting the resulting FA back to a regular expression.

As everyone has noted, if your regular expression engine supports negative lookahead, the regular expression is much simpler. For example, with GNU grep:

grep -P '^((?!hede).)*$' input

However, this approach has the disadvantage that it requires a backtracking regular expression engine. This makes it unsuitable in installations that are using secure regular expression engines like RE2, which is one reason to prefer the generated approach in some circumstances.

Using Kendall Hopkins' excellent FormalTheory library, written in PHP, which provides a functionality similar to Grail, and a simplifier written by myself, I've been able to write an online generator of negative regular expressions given an input phrase (only alphanumeric and space characters currently supported): http://www.formauri.es/personal/pgimeno/misc/non-match-regex/

For hede it outputs:

^([^h]|h(h|e(h|dh))*([^eh]|e([^dh]|d[^eh])))*(h(h|e(h|dh))*(ed?)?)?$

which is equivalent to the above.

User · Answer

Since the introduction of ruby-2 4 1  we can use the new Absent Operator in Ruby   s Regular Expressions  from the official doc     abc  matches       ab    aab    cccc   etc  It doesn t match   abc    aabc    ccccabc   etc    Thus  in your case     hede   does the job for you  2 4 1  016  gt    hoho    hihi    haha    hede   select  s       hede    match s      gt    hoho    hihi    haha

User · Answer

Benchmarks

I decided to evaluate some of the presented Options and compare their performance, as well as use some new Features. Benchmarking on .NET Regex Engine: http://regexhero.net/tester/

Benchmark Text:

The first 7 lines should not match, since they contain the searched Expression, while the lower 7 lines should match!

Regex Hero is a real-time online Silverlight Regular Expression Tester.
XRegex Hero is a real-time online Silverlight Regular Expression Tester.
Regex HeroRegex HeroRegex HeroRegex HeroRegex Hero is a real-time online Silverlight Regular Expression Tester.
Regex Her Regex Her Regex Her Regex Her Regex Her Regex Her Regex Hero is a real-time online Silverlight Regular Expression Tester.
Regex Her is a real-time online Silverlight Regular Expression Tester.Regex Hero
egex Hero egex Hero egex Hero egex Hero egex Hero egex Hero Regex Hero is a real-time online Silverlight Regular Expression Tester.
RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRegex Hero is a real-time online Silverlight Regular Expression Tester.

Regex Her
egex Hero
egex Hero is a real-time online Silverlight Regular Expression Tester.
Regex Her is a real-time online Silverlight Regular Expression Tester.
Regex Her Regex Her Regex Her Regex Her Regex Her Regex Her is a real-time online Silverlight Regular Expression Tester.
Nobody is a real-time online Silverlight Regular Expression Tester.
Regex Her o egex Hero Regex  Hero Reg ex Hero is a real-time online Silverlight Regular Expression Tester.

Results:

Results are Iterations per second as the median of 3 runs - Bigger Number = Better

01: ^((?!Regex Hero).)*$                    3.914   // Accepted Answer
02: ^(?:(?!Regex Hero).)*$                  5.034   // With Non-Capturing group
03: ^(?>[^R]+|R(?!egex Hero))*$             6.137   // Lookahead only on the right first letter
04: ^(?>(?:.*?Regex Hero)?)^.*$             7.426   // Match the word and check if you're still at linestart
05: ^(?(?=.*?Regex Hero)(?#fail)|.*)$       7.371   // Logic Branch: Find Regex Hero? match nothing, else anything

P1: ^(?(?=.*?Regex Hero)(*FAIL)|(*ACCEPT))  ?????   // Logic Branch in Perl - Quick FAIL
P2: .*?Regex Hero(*COMMIT)(*FAIL)|(*ACCEPT) ?????   // Direct COMMIT & FAIL in Perl

Since .NET doesn't support action Verbs (*FAIL, etc.) I couldn't test the solutions P1 and P2.

Summary:

I tried to test most proposed solutions, some Optimizations are possible for certain words. For Example if the First two letters of the search string are not the Same, answer 03 can be expanded to ^(?>[^R]+|R+(?!egex Hero))*$ resulting in a small performance gain.

But the overall most readable and performance-wise fastest solution seems to be 05 using a conditional statement or 04 with the possesive quantifier. I think the Perl solutions should be even faster and more easily readable.

User · Answer

A simpler solution is to use the not operator !

Your if statement will need to match "contains" and not match "excludes".

var contains = /abc/;
var excludes =/hede/;

if(string.match(contains) && !(string.match(excludes))){  //proceed...

I believe the designers of RegEx anticipated the use of not operators.

User · Answer

Another option is that to add a positive look-ahead and check if hede is anywhere in the input line, then we would negate that, with an expression similar to:

^(?!(?=.*\bhede\b)).*$

with word boundaries.

The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

RegEx Circuit

jex.im visualizes regular expressions:

User · Answer

With ConyEdit  you can use the command line cc gl   hede  to get lines that do not contain the regex matching  or use the command line cc dl  hede  to delete lines that contain the regex matching  They have the same result

[regex] Regular expression to match a line that doesn't contain a word

Input:

Code:

Desired output:

The answer is

Explanation

Matches

Does Not Match

Matches

Does Not Match

How to use PCRE's backtracking control verbs to match a line not containing a word

How it works

RegEx Circuit

How to use PCRE's backtracking control verbs to match a line not containing a word

How it works

For the desired output

For a better understanding

Benchmarks

Benchmark Text:

Results:

Summary:

For the desired output

For a better understanding

Benchmarks

Benchmark Text:

Results:

Summary:

RegEx Circuit

Examples related to regex

Examples related to regex-negation

Tags