[regex] Regular expression to match a line that doesn't contain a word

Maybe you'll find this on Google while trying to write a regex that is able to match segments of a line (as opposed to entire lines) which do not contain a substring. Tooke me a while to figure out, so I'll share:

Given a string: <span class="good">bar</span><span class="bad">foo</span><span class="ugly">baz</span>

I want to match <span> tags which do not contain the substring "bad".

/<span(?:(?!bad).)*?> will match <span class=\"good\"> and <span class=\"ugly\">.

Notice that there are two sets (layers) of parentheses:

  • The innermost one is for the negative lookahead (it is not a capture group)
  • The outermost was interpreted by Ruby as capture group but we don't want it to be a capture group, so I added ?: at it's beginning and it is no longer interpreted as a capture group.

Demo in Ruby:

s = '<span class="good">bar</span><span class="bad">foo</span><span class="ugly">baz</span>'
s.scan(/<span(?:(?!bad).)*?>/)
# => ["<span class=\"good\">", "<span class=\"ugly\">"]