I have something like this:
extensionsToCheck = ['.pdf', '.doc', '.xls']
for extension in extensionsToCheck:
if extension in url_string:
print(url_string)
I am wondering what would be the more elegant way to do this in Python (without using the for loop)? I was thinking of something like this (like from C/C++), but it didn't work:
if ('.pdf' or '.doc' or '.xls') in url_string:
print(url_string)
Edit: I'm kinda forced to explain how this is different to the question below which is marked as potential duplicate (so it doesn't get closed I guess).
The difference is, I wanted to check if a string is part of some list of strings whereas the other question is checking whether a string from a list of strings is a substring of another string. Similar, but not quite the same and semantics matter when you're looking for an answer online IMHO. These two questions are actually looking to solve the opposite problem of one another. The solution for both turns out to be the same though.
This question is related to
python
string
if-statement
This is a variant of the list comprehension answer given by @psun.
By switching the output value, you can actually extract the matching pattern from the list comprehension (something not possible with the any()
approach by @Lauritz-v-Thaulow)
extensionsToCheck = ['.pdf', '.doc', '.xls']
url_string = 'http://.../foo.doc'
print [extension for extension in extensionsToCheck if(extension in url_string)]
['.doc']`
You can furthermore insert a regular expression if you want to collect additional information once the matched pattern is known (this could be useful when the list of allowed patterns is too long to write into a single regex pattern)
print [re.search(r'(\w+)'+extension, url_string).group(0) for extension in extensionsToCheck if(extension in url_string)]
['foo.doc']
extensionsToCheck = ('.pdf', '.doc', '.xls')
'test.doc'.endswith(extensionsToCheck) # returns True
'test.jpg'.endswith(extensionsToCheck) # returns False
Use list comprehensions if you want a single line solution. The following code returns a list containing the url_string when it has the extensions .doc, .pdf and .xls or returns empty list when it doesn't contain the extension.
print [url_string for extension in extensionsToCheck if(extension in url_string)]
NOTE: This is only to check if it contains or not and is not useful when one wants to extract the exact word matching the extensions.
Check if it matches this regex:
'(\.pdf$|\.doc$|\.xls$)'
Note: if you extensions are not at the end of the url, remove the $
characters, but it does weaken it slightly
It is better to parse the URL properly - this way you can handle http://.../file.doc?foo
and http://.../foo.doc/file.exe
correctly.
from urlparse import urlparse
import os
path = urlparse(url_string).path
ext = os.path.splitext(path)[1]
if ext in extensionsToCheck:
print(url_string)
Source: Stackoverflow.com