I'd like to use a variable
inside a regex
, how can I do this in Python
?
TEXTO = sys.argv[1]
if re.search(r"\b(?=\w)TEXTO\b(?!\w)", subject, re.IGNORECASE):
# Successful match
else:
# Match attempt failed
This question is related to
python
regex
python-3.x
variables
escaping
I find it very convenient to build a regular expression pattern by stringing together multiple smaller patterns.
import re
string = "begin:id1:tag:middl:id2:tag:id3:end"
re_str1 = r'(?<=(\S{5})):'
re_str2 = r'(id\d+):(?=tag:)'
re_pattern = re.compile(re_str1 + re_str2)
match = re_pattern.findall(string)
print(match)
Output:
[('begin', 'id1'), ('middl', 'id2')]
you can try another usage using format
grammer suger:
re_genre = r'{}'.format(your_variable)
regex_pattern = re.compile(re_genre)
You can use format keyword as well for this.Format method will replace {} placeholder to the variable which you passed to the format method as an argument.
if re.search(r"\b(?=\w)**{}**\b(?!\w)".**format(TEXTO)**, subject, re.IGNORECASE):
# Successful match**strong text**
else:
# Match attempt failed
I needed to search for usernames that are similar to each other, and what Ned Batchelder said was incredibly helpful. However, I found I had cleaner output when I used re.compile to create my re search term:
pattern = re.compile(r"("+username+".*):(.*?):(.*?):(.*?):(.*)"
matches = re.findall(pattern, lines)
Output can be printed using the following:
print(matches[1]) # prints one whole matching line (in this case, the first line)
print(matches[1][3]) # prints the fourth character group (established with the parentheses in the regex statement) of the first line.
more example
I have configus.yml with flows files
"pattern":
- _(\d{14})_
"datetime_string":
- "%m%d%Y%H%M%f"
in python code I use
data_time_real_file=re.findall(r""+flows[flow]["pattern"][0]+"", latest_file)
You have to build the regex as a string:
TEXTO = sys.argv[1]
my_regex = r"\b(?=\w)" + re.escape(TEXTO) + r"\b(?!\w)"
if re.search(my_regex, subject, re.IGNORECASE):
etc.
Note the use of re.escape
so that if your text has special characters, they won't be interpreted as such.
if re.search(r"\b(?<=\w)%s\b(?!\w)" % TEXTO, subject, re.IGNORECASE):
This will insert what is in TEXTO into the regex as a string.
rx = r'\b(?<=\w){0}\b(?!\w)'.format(TEXTO)
I agree with all the above unless:
sys.argv[1]
was something like Chicken\d{2}-\d{2}An\s*important\s*anchor
sys.argv[1] = "Chicken\d{2}-\d{2}An\s*important\s*anchor"
you would not want to use re.escape
, because in that case you would like it to behave like a regex
TEXTO = sys.argv[1]
if re.search(r"\b(?<=\w)" + TEXTO + "\b(?!\w)", subject, re.IGNORECASE):
# Successful match
else:
# Match attempt failed
Source: Stackoverflow.com