In some cases it's desirable to replace consecutive occurrences of every whitespace character with a single instance of that character. You'd use a regular expression with backreferences to do that.
(\s)\1{1,}
matches any whitespace character, followed by one or more occurrences of that character. Now, all you need to do is specify the first group (\1
) as the replacement for the match.
Wrapping this in a function:
import re
def normalize_whitespace(string):
return re.sub(r'(\s)\1{1,}', r'\1', string)
>>> normalize_whitespace('The fox jumped over the log.')
'The fox jumped over the log.'
>>> normalize_whitespace('First line\t\t\t \n\n\nSecond line')
'First line\t \nSecond line'