Any time I want to replace a piece of text that is part of a larger piece of text, I always have to do something like:
"(?P<start>some_pattern)(?P<replace>foo)(?P<end>end)"
And then concatenate the start
group with the new data for replace
and then the end
group.
Is there a better method for this?
Look in the Python re documentation for lookaheads (?=...)
and lookbehinds (?<=...)
-- I'm pretty sure they're what you want. They match strings, but do not "consume" the bits of the strings they match.
The short version is that you cannot use variable-width patterns in lookbehinds using Python's re
module. There is no way to change this:
>>> import re
>>> re.sub("(?<=foo)bar(?=baz)", "quux", "foobarbaz")
'fooquuxbaz'
>>> re.sub("(?<=fo+)bar(?=baz)", "quux", "foobarbaz")
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
re.sub("(?<=fo+)bar(?=baz)", "quux", string)
File "C:\Development\Python25\lib\re.py", line 150, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "C:\Development\Python25\lib\re.py", line 241, in _compile
raise error, v # invalid expression
error: look-behind requires fixed-width pattern
This means that you'll need to work around it, the simplest solution being very similar to what you're doing now:
>>> re.sub("(fo+)bar(?=baz)", "\\1quux", "foobarbaz")
'fooquuxbaz'
>>>
>>> # If you need to turn this into a callable function:
>>> def replace(start, replace, end, replacement, search):
return re.sub("(" + re.escape(start) + ")" + re.escape(replace) + "(?=" + re.escape + ")", "\\1" + re.escape(replacement), search)
This doesn't have the elegance of the lookbehind solution, but it's still a very clear, straightforward one-liner. And if you look at what an expert has to say on the matter (he's talking about JavaScript, which lacks lookbehinds entirely, but many of the principles are the same), you'll see that his simplest solution looks a lot like this one.
>>> import re
>>> s = "start foo end"
>>> s = re.sub("foo", "replaced", s)
>>> s
'start replaced end'
>>> s = re.sub("(?<= )(.+)(?= )", lambda m: "can use a callable for the %s text too" % m.group(1), s)
>>> s
'start can use a callable for the replaced text too end'
>>> help(re.sub)
Help on function sub in module re:
sub(pattern, repl, string, count=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a callable, it's passed the match object and must return
a replacement string to be used.
I believe that the best idea is just to capture in a group whatever you want to replace, and then replace it by using the start and end properties of the captured group.
regards
Adrián
#the pattern will contain the expression we want to replace as the first group
pat = "word1\s(.*)\sword2"
test = "word1 will never be a word2"
repl = "replace"
import re
m = re.search(pat,test)
if m and m.groups() > 0:
line = test[:m.start(1)] + repl + test[m.end(1):]
print line
else:
print "the pattern didn't capture any text"
This will print: 'word1 will never be a word2'
The group to be replaced could be located in any position of the string.
The short version is that you cannot use variable-width patterns in lookbehinds using Python's re
module. There is no way to change this:
>>> import re
>>> re.sub("(?<=foo)bar(?=baz)", "quux", "foobarbaz")
'fooquuxbaz'
>>> re.sub("(?<=fo+)bar(?=baz)", "quux", "foobarbaz")
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
re.sub("(?<=fo+)bar(?=baz)", "quux", string)
File "C:\Development\Python25\lib\re.py", line 150, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "C:\Development\Python25\lib\re.py", line 241, in _compile
raise error, v # invalid expression
error: look-behind requires fixed-width pattern
This means that you'll need to work around it, the simplest solution being very similar to what you're doing now:
>>> re.sub("(fo+)bar(?=baz)", "\\1quux", "foobarbaz")
'fooquuxbaz'
>>>
>>> # If you need to turn this into a callable function:
>>> def replace(start, replace, end, replacement, search):
return re.sub("(" + re.escape(start) + ")" + re.escape(replace) + "(?=" + re.escape + ")", "\\1" + re.escape(replacement), search)
This doesn't have the elegance of the lookbehind solution, but it's still a very clear, straightforward one-liner. And if you look at what an expert has to say on the matter (he's talking about JavaScript, which lacks lookbehinds entirely, but many of the principles are the same), you'll see that his simplest solution looks a lot like this one.
>>> import re
>>> s = "start foo end"
>>> s = re.sub("foo", "replaced", s)
>>> s
'start replaced end'
>>> s = re.sub("(?<= )(.+)(?= )", lambda m: "can use a callable for the %s text too" % m.group(1), s)
>>> s
'start can use a callable for the replaced text too end'
>>> help(re.sub)
Help on function sub in module re:
sub(pattern, repl, string, count=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a callable, it's passed the match object and must return
a replacement string to be used.
I believe that the best idea is just to capture in a group whatever you want to replace, and then replace it by using the start and end properties of the captured group.
regards
Adrián
#the pattern will contain the expression we want to replace as the first group
pat = "word1\s(.*)\sword2"
test = "word1 will never be a word2"
repl = "replace"
import re
m = re.search(pat,test)
if m and m.groups() > 0:
line = test[:m.start(1)] + repl + test[m.end(1):]
print line
else:
print "the pattern didn't capture any text"
This will print: 'word1 will never be a word2'
The group to be replaced could be located in any position of the string.
>>> import re
>>> s = "start foo end"
>>> s = re.sub("foo", "replaced", s)
>>> s
'start replaced end'
>>> s = re.sub("(?<= )(.+)(?= )", lambda m: "can use a callable for the %s text too" % m.group(1), s)
>>> s
'start can use a callable for the replaced text too end'
>>> help(re.sub)
Help on function sub in module re:
sub(pattern, repl, string, count=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a callable, it's passed the match object and must return
a replacement string to be used.
The short version is that you cannot use variable-width patterns in lookbehinds using Python's re
module. There is no way to change this:
>>> import re
>>> re.sub("(?<=foo)bar(?=baz)", "quux", "foobarbaz")
'fooquuxbaz'
>>> re.sub("(?<=fo+)bar(?=baz)", "quux", "foobarbaz")
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
re.sub("(?<=fo+)bar(?=baz)", "quux", string)
File "C:\Development\Python25\lib\re.py", line 150, in sub
return _compile(pattern, 0).sub(repl, string, count)
File "C:\Development\Python25\lib\re.py", line 241, in _compile
raise error, v # invalid expression
error: look-behind requires fixed-width pattern
This means that you'll need to work around it, the simplest solution being very similar to what you're doing now:
>>> re.sub("(fo+)bar(?=baz)", "\\1quux", "foobarbaz")
'fooquuxbaz'
>>>
>>> # If you need to turn this into a callable function:
>>> def replace(start, replace, end, replacement, search):
return re.sub("(" + re.escape(start) + ")" + re.escape(replace) + "(?=" + re.escape + ")", "\\1" + re.escape(replacement), search)
This doesn't have the elegance of the lookbehind solution, but it's still a very clear, straightforward one-liner. And if you look at what an expert has to say on the matter (he's talking about JavaScript, which lacks lookbehinds entirely, but many of the principles are the same), you'll see that his simplest solution looks a lot like this one.
Source: Stackoverflow.com