Possible Duplicate:
Check if multiple strings exist in another string
I am trying to find out if there is a nice and clean way to test for 3 different strings.
Basically I am looping trough a file using a for
loop; then I have to check if it contains 1 of the 3 strings that I have set in a list.
So far I have found the multiple if condition check, but it does not feel like is really elegant and efficient:
for line in file
if "string1" in line or "string2" in line or "string3" in line:
print "found the string"
I was thinking like creating a list that contains string1
, string2
and string3
, and check if any of these is contained in the line, but it doesn't seems that i can just compare the list without explicitly loop trough the list, and in that case I am basically in the same conditions as in the multiple if statement that I wrote above.
Is there any smart way to check against multiple strings without writing long if statements or loop trough the elements of a list?
This question is related to
python
One approach is to combine the search strings into a regex pattern as in this answer.
This still loops through the cartesian product of the two lists, but it does it one line:
>>> lines1 = ['soup', 'butter', 'venison']
>>> lines2 = ['prune', 'rye', 'turkey']
>>> search_strings = ['a', 'b', 'c']
>>> any(s in l for l in lines1 for s in search_strings)
True
>>> any(s in l for l in lines2 for s in search_strings)
False
This also have the advantage that any
short-circuits, and so the looping stops as soon as a match is found. Also, this only finds the first occurrence of a string from search_strings
in linesX
. If you want to find multiple occurrences you could do something like this:
>>> lines3 = ['corn', 'butter', 'apples']
>>> [(s, l) for l in lines3 for s in search_strings if s in l]
[('c', 'corn'), ('b', 'butter'), ('a', 'apples')]
If you feel like coding something more complex, it seems the Aho-Corasick algorithm can test for the presence of multiple substrings in a given input string. (Thanks to Niklas B. for pointing that out.) I still think it would result in quadratic performance for your use-case since you'll still have to call it multiple times to search multiple lines. However, it would beat the above (cubic, on average) algorithm.
Source: Stackoverflow.com