[regex] Regular Expressions: Is there an AND operator?

Obviously, you can use the | (pipe?) to represent OR, but is there a way to represent AND as well?

Specifically, I'd like to match paragraphs of text that contain ALL of a certain phrase, but in no particular order.

This question is related to regex lookahead

The answer is


The AND operator is implicit in the RegExp syntax.
The OR operator has instead to be specified with a pipe.
The following RegExp:

var re = /ab/;

means the letter a AND the letter b.
It also works with groups:

var re = /(co)(de)/;

it means the group co AND the group de.
Replacing the (implicit) AND with an OR would require the following lines:

var re = /a|b/;
var re = /(co)|(de)/;

Look at this example:

We have 2 regexps A and B and we want to match both of them, so in pseudo-code it looks like this:

pattern = "/A AND B/"

It can be written without using the AND operator like this:

pattern = "/NOT (NOT A OR NOT B)/"

in PCRE:

"/(^(^A|^B))/"

regexp_match(pattern,data)

((yes).*(no))|((no).*(yes))

Will match sentence having both yes and no at the same time, regardless the order in which they appear:

Do i like cookies? **Yes**, i do. But milk - **no**, definitely no.

**No**, you may not have my phone. **Yes**, you may go f yourself.

Will both match, ignoring case.


If you use Perl regular expressions, you can use positive lookahead:

For example

(?=[1-9][0-9]{2})[0-9]*[05]\b

would be numbers greater than 100 and divisible by 5


Use a non-consuming regular expression.

The typical (i.e. Perl/Java) notation is:

(?=expr)

This means "match expr but after that continue matching at the original match-point."

You can do as many of these as you want, and this will be an "and." Example:

(?=match this expression)(?=match this too)(?=oh, and this)

You can even add capture groups inside the non-consuming expressions if you need to save some of the data therein.


Is it not possible in your case to do the AND on several matching results? in pseudocode

regexp_match(pattern1, data) && regexp_match(pattern2, data) && ...

Use AND outside the regular expression. In PHP lookahead operator did not not seem to work for me, instead I used this

if( preg_match("/^.{3,}$/",$pass1) && !preg_match("/\s{1}/",$pass1))
    return true;
else
    return false;

The above regex will match if the password length is 3 characters or more and there are no spaces in the password.


Use a non-consuming regular expression.

The typical (i.e. Perl/Java) notation is:

(?=expr)

This means "match expr but after that continue matching at the original match-point."

You can do as many of these as you want, and this will be an "and." Example:

(?=match this expression)(?=match this too)(?=oh, and this)

You can even add capture groups inside the non-consuming expressions if you need to save some of the data therein.


((yes).*(no))|((no).*(yes))

Will match sentence having both yes and no at the same time, regardless the order in which they appear:

Do i like cookies? **Yes**, i do. But milk - **no**, definitely no.

**No**, you may not have my phone. **Yes**, you may go f yourself.

Will both match, ignoring case.


You need to use lookahead as some of the other responders have said, but the lookahead has to account for other characters between its target word and the current match position. For example:

(?=.*word1)(?=.*word2)(?=.*word3)

The .* in the first lookahead lets it match however many characters it needs to before it gets to "word1". Then the match position is reset and the second lookahead seeks out "word2". Reset again, and the final part matches "word3"; since it's the last word you're checking for, it isn't necessary that it be in a lookahead, but it doesn't hurt.

In order to match a whole paragraph, you need to anchor the regex at both ends and add a final .* to consume the remaining characters. Using Perl-style notation, that would be:

/^(?=.*word1)(?=.*word2)(?=.*word3).*$/m

The 'm' modifier is for multline mode; it lets the ^ and $ match at paragraph boundaries ("line boundaries" in regex-speak). It's essential in this case that you not use the 's' modifier, which lets the dot metacharacter match newlines as well as all other characters.

Finally, you want to make sure you're matching whole words and not just fragments of longer words, so you need to add word boundaries:

/^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$/m

The order is always implied in the structure of the regular expression. To accomplish what you want, you'll have to match the input string multiple times against different expressions.

What you want to do is not possible with a single regexp.


You can do that with a regular expression but probably you'll want to some else. For example use several regexp and combine them in a if clause.

You can enumerate all possible permutations with a standard regexp, like this (matches a, b and c in any order):

(abc)|(bca)|(acb)|(bac)|(cab)|(cba)

However, this makes a very long and probably inefficient regexp, if you have more than couple terms.

If you are using some extended regexp version, like Perl's or Java's, they have better ways to do this. Other answers have suggested using positive lookahead operation.


Use AND outside the regular expression. In PHP lookahead operator did not not seem to work for me, instead I used this

if( preg_match("/^.{3,}$/",$pass1) && !preg_match("/\s{1}/",$pass1))
    return true;
else
    return false;

The above regex will match if the password length is 3 characters or more and there are no spaces in the password.


Why not use awk?
with awk regex AND, OR matters is so simple

awk '/WORD1/ && /WORD2/ && /WORD3/' myfile

You need to use lookahead as some of the other responders have said, but the lookahead has to account for other characters between its target word and the current match position. For example:

(?=.*word1)(?=.*word2)(?=.*word3)

The .* in the first lookahead lets it match however many characters it needs to before it gets to "word1". Then the match position is reset and the second lookahead seeks out "word2". Reset again, and the final part matches "word3"; since it's the last word you're checking for, it isn't necessary that it be in a lookahead, but it doesn't hurt.

In order to match a whole paragraph, you need to anchor the regex at both ends and add a final .* to consume the remaining characters. Using Perl-style notation, that would be:

/^(?=.*word1)(?=.*word2)(?=.*word3).*$/m

The 'm' modifier is for multline mode; it lets the ^ and $ match at paragraph boundaries ("line boundaries" in regex-speak). It's essential in this case that you not use the 's' modifier, which lets the dot metacharacter match newlines as well as all other characters.

Finally, you want to make sure you're matching whole words and not just fragments of longer words, so you need to add word boundaries:

/^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$/m

Look at this example:

We have 2 regexps A and B and we want to match both of them, so in pseudo-code it looks like this:

pattern = "/A AND B/"

It can be written without using the AND operator like this:

pattern = "/NOT (NOT A OR NOT B)/"

in PCRE:

"/(^(^A|^B))/"

regexp_match(pattern,data)

Why not use awk?
with awk regex AND, OR matters is so simple

awk '/WORD1/ && /WORD2/ && /WORD3/' myfile

Is it not possible in your case to do the AND on several matching results? in pseudocode

regexp_match(pattern1, data) && regexp_match(pattern2, data) && ...

If you use Perl regular expressions, you can use positive lookahead:

For example

(?=[1-9][0-9]{2})[0-9]*[05]\b

would be numbers greater than 100 and divisible by 5


The order is always implied in the structure of the regular expression. To accomplish what you want, you'll have to match the input string multiple times against different expressions.

What you want to do is not possible with a single regexp.


In addition to the accepted answer

I will provide you with some practical examples that will get things more clear to some of You. For example lets say we have those three lines of text:

[12/Oct/2015:00:37:29 +0200] // only this + will get selected
[12/Oct/2015:00:37:x9 +0200]
[12/Oct/2015:00:37:29 +020x]

See demo here DEMO

What we want to do here is to select the + sign but only if it's after two numbers with a space and if it's before four numbers. Those are the only constraints. We would use this regular expression to achieve it:

'~(?<=\d{2} )\+(?=\d{4})~g'

Note if you separate the expression it will give you different results.

Or perhaps you want to select some text between tags... but not the tags! Then you could use:

'~(?<=<p>).*?(?=<\/p>)~g'

for this text:

<p>Hello !</p> <p>I wont select tags! Only text with in</p> 

See demo here DEMO


If you use Perl regular expressions, you can use positive lookahead:

For example

(?=[1-9][0-9]{2})[0-9]*[05]\b

would be numbers greater than 100 and divisible by 5


You need to use lookahead as some of the other responders have said, but the lookahead has to account for other characters between its target word and the current match position. For example:

(?=.*word1)(?=.*word2)(?=.*word3)

The .* in the first lookahead lets it match however many characters it needs to before it gets to "word1". Then the match position is reset and the second lookahead seeks out "word2". Reset again, and the final part matches "word3"; since it's the last word you're checking for, it isn't necessary that it be in a lookahead, but it doesn't hurt.

In order to match a whole paragraph, you need to anchor the regex at both ends and add a final .* to consume the remaining characters. Using Perl-style notation, that would be:

/^(?=.*word1)(?=.*word2)(?=.*word3).*$/m

The 'm' modifier is for multline mode; it lets the ^ and $ match at paragraph boundaries ("line boundaries" in regex-speak). It's essential in this case that you not use the 's' modifier, which lets the dot metacharacter match newlines as well as all other characters.

Finally, you want to make sure you're matching whole words and not just fragments of longer words, so you need to add word boundaries:

/^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$/m

In addition to the accepted answer

I will provide you with some practical examples that will get things more clear to some of You. For example lets say we have those three lines of text:

[12/Oct/2015:00:37:29 +0200] // only this + will get selected
[12/Oct/2015:00:37:x9 +0200]
[12/Oct/2015:00:37:29 +020x]

See demo here DEMO

What we want to do here is to select the + sign but only if it's after two numbers with a space and if it's before four numbers. Those are the only constraints. We would use this regular expression to achieve it:

'~(?<=\d{2} )\+(?=\d{4})~g'

Note if you separate the expression it will give you different results.

Or perhaps you want to select some text between tags... but not the tags! Then you could use:

'~(?<=<p>).*?(?=<\/p>)~g'

for this text:

<p>Hello !</p> <p>I wont select tags! Only text with in</p> 

See demo here DEMO


You could pipe your output to another regex. Using grep, you could do this:

grep A | grep B


You need to use lookahead as some of the other responders have said, but the lookahead has to account for other characters between its target word and the current match position. For example:

(?=.*word1)(?=.*word2)(?=.*word3)

The .* in the first lookahead lets it match however many characters it needs to before it gets to "word1". Then the match position is reset and the second lookahead seeks out "word2". Reset again, and the final part matches "word3"; since it's the last word you're checking for, it isn't necessary that it be in a lookahead, but it doesn't hurt.

In order to match a whole paragraph, you need to anchor the regex at both ends and add a final .* to consume the remaining characters. Using Perl-style notation, that would be:

/^(?=.*word1)(?=.*word2)(?=.*word3).*$/m

The 'm' modifier is for multline mode; it lets the ^ and $ match at paragraph boundaries ("line boundaries" in regex-speak). It's essential in this case that you not use the 's' modifier, which lets the dot metacharacter match newlines as well as all other characters.

Finally, you want to make sure you're matching whole words and not just fragments of longer words, so you need to add word boundaries:

/^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$/m

You can do that with a regular expression but probably you'll want to some else. For example use several regexp and combine them in a if clause.

You can enumerate all possible permutations with a standard regexp, like this (matches a, b and c in any order):

(abc)|(bca)|(acb)|(bac)|(cab)|(cba)

However, this makes a very long and probably inefficient regexp, if you have more than couple terms.

If you are using some extended regexp version, like Perl's or Java's, they have better ways to do this. Other answers have suggested using positive lookahead operation.


The AND operator is implicit in the RegExp syntax.
The OR operator has instead to be specified with a pipe.
The following RegExp:

var re = /ab/;

means the letter a AND the letter b.
It also works with groups:

var re = /(co)(de)/;

it means the group co AND the group de.
Replacing the (implicit) AND with an OR would require the following lines:

var re = /a|b/;
var re = /(co)|(de)/;

If you use Perl regular expressions, you can use positive lookahead:

For example

(?=[1-9][0-9]{2})[0-9]*[05]\b

would be numbers greater than 100 and divisible by 5


Use a non-consuming regular expression.

The typical (i.e. Perl/Java) notation is:

(?=expr)

This means "match expr but after that continue matching at the original match-point."

You can do as many of these as you want, and this will be an "and." Example:

(?=match this expression)(?=match this too)(?=oh, and this)

You can even add capture groups inside the non-consuming expressions if you need to save some of the data therein.


Is it not possible in your case to do the AND on several matching results? in pseudocode

regexp_match(pattern1, data) && regexp_match(pattern2, data) && ...

You can do that with a regular expression but probably you'll want to some else. For example use several regexp and combine them in a if clause.

You can enumerate all possible permutations with a standard regexp, like this (matches a, b and c in any order):

(abc)|(bca)|(acb)|(bac)|(cab)|(cba)

However, this makes a very long and probably inefficient regexp, if you have more than couple terms.

If you are using some extended regexp version, like Perl's or Java's, they have better ways to do this. Other answers have suggested using positive lookahead operation.


Use a non-consuming regular expression.

The typical (i.e. Perl/Java) notation is:

(?=expr)

This means "match expr but after that continue matching at the original match-point."

You can do as many of these as you want, and this will be an "and." Example:

(?=match this expression)(?=match this too)(?=oh, and this)

You can even add capture groups inside the non-consuming expressions if you need to save some of the data therein.


The order is always implied in the structure of the regular expression. To accomplish what you want, you'll have to match the input string multiple times against different expressions.

What you want to do is not possible with a single regexp.


You can do that with a regular expression but probably you'll want to some else. For example use several regexp and combine them in a if clause.

You can enumerate all possible permutations with a standard regexp, like this (matches a, b and c in any order):

(abc)|(bca)|(acb)|(bac)|(cab)|(cba)

However, this makes a very long and probably inefficient regexp, if you have more than couple terms.

If you are using some extended regexp version, like Perl's or Java's, they have better ways to do this. Other answers have suggested using positive lookahead operation.


You could pipe your output to another regex. Using grep, you could do this:

grep A | grep B