[python] Python match a string with regex

I need a python regular expression to check if a word is present in a string. The string is separated by commas, potentially.

So for example,

line = 'This,is,a,sample,string'

I want to search based on "sample", this would return true. I am crappy with reg ex, so when I looked at the python docs, I saw something like

import re
re.match(r'sample', line)

But I don't know why there was an 'r' before the text to be matched. Can someone help me with the regular expression?

This question is related to python regex

The answer is


The r makes the string a raw string, which doesn't process escape characters (however, since there are none in the string, it is actually not needed here).

Also, re.match matches from the beginning of the string. In other words, it looks for an exact match between the string and the pattern. To match stuff that could be anywhere in the string, use re.search. See a demonstration below:

>>> import re
>>> line = 'This,is,a,sample,string'
>>> re.match("sample", line)
>>> re.search("sample", line)
<_sre.SRE_Match object at 0x021D32C0>
>>>

As everyone else has mentioned it is better to use the "in" operator, it can also act on lists:

line = "This,is,a,sample,string"
lst = ['This', 'sample']
for i in lst:
     i in line

>> True
>> True

One Liner implementation:

a=[1,3]
b=[1,2,3,4]
all(i in b for i in a)

You do not need regular expressions to check if a substring exists in a string.

line = 'This,is,a,sample,string'
result = bool('sample' in line) # returns True

If you want to know if a string contains a pattern then you should use re.search

line = 'This,is,a,sample,string'
result = re.search(r'sample', line) # finds 'sample'

This is best used with pattern matching, for example:

line = 'my name is bob'
result = re.search(r'my name is (\S+)', line) # finds 'bob'

r stands for a raw string, so things like \ will be automatically escaped by Python.

Normally, if you wanted your pattern to include something like a backslash you'd need to escape it with another backslash. raw strings eliminate this problem.

short explanation

In your case, it does not matter much but it's a good habit to get into early otherwise something like \b will bite you in the behind if you are not careful (will be interpreted as backspace character instead of word boundary)

As per re.match vs re.search here's an example that will clarify it for you:

>>> import re
>>> testString = 'hello world'
>>> re.match('hello', testString)
<_sre.SRE_Match object at 0x015920C8>
>>> re.search('hello', testString)
<_sre.SRE_Match object at 0x02405560>
>>> re.match('world', testString)
>>> re.search('world', testString)
<_sre.SRE_Match object at 0x015920C8>

So search will find a match anywhere, match will only start at the beginning