[regex] Non greedy (reluctant) regex matching in sed?

This is how to robustly do non-greedy matching of multi-character strings using sed. Lets say you want to change every foo...bar to <foo...bar> so for example this input:

$ cat file
ABC foo DEF bar GHI foo KLM bar NOP foo QRS bar TUV

should become this output:

ABC <foo DEF bar> GHI <foo KLM bar> NOP <foo QRS bar> TUV

To do that you convert foo and bar to individual characters and then use the negation of those characters between them:

$ sed 's/@/@A/g; s/{/@B/g; s/}/@C/g; s/foo/{/g; s/bar/}/g; s/{[^{}]*}/<&>/g; s/}/bar/g; s/{/foo/g; s/@C/}/g; s/@B/{/g; s/@A/@/g' file
ABC <foo DEF bar> GHI <foo KLM bar> NOP <foo QRS bar> TUV

In the above:

  1. s/@/@A/g; s/{/@B/g; s/}/@C/g is converting { and } to placeholder strings that cannot exist in the input so those chars then are available to convert foo and bar to.
  2. s/foo/{/g; s/bar/}/g is converting foo and bar to { and } respectively
  3. s/{[^{}]*}/<&>/g is performing the op we want - converting foo...bar to <foo...bar>
  4. s/}/bar/g; s/{/foo/g is converting { and } back to foo and bar.
  5. s/@C/}/g; s/@B/{/g; s/@A/@/g is converting the placeholder strings back to their original characters.

Note that the above does not rely on any particular string not being present in the input as it manufactures such strings in the first step, nor does it care which occurrence of any particular regexp you want to match since you can use {[^{}]*} as many times as necessary in the expression to isolate the actual match you want and/or with seds numeric match operator, e.g. to only replace the 2nd occurrence:

$ sed 's/@/@A/g; s/{/@B/g; s/}/@C/g; s/foo/{/g; s/bar/}/g; s/{[^{}]*}/<&>/2; s/}/bar/g; s/{/foo/g; s/@C/}/g; s/@B/{/g; s/@A/@/g' file
ABC foo DEF bar GHI <foo KLM bar> NOP foo QRS bar TUV

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to sed

Retrieve last 100 lines logs How to replace multiple patterns at once with sed? Insert multiple lines into a file after specified pattern using shell script Linux bash script to extract IP address Ansible playbook shell output remove white space from the end of line in linux bash, extract string before a colon invalid command code ., despite escaping periods, using sed RE error: illegal byte sequence on Mac OS X How to use variables in a command in sed?

Examples related to pcre

PHP regular expressions: No ending delimiter '^' found in Non greedy (reluctant) regex matching in sed? Invert match with regexp

Examples related to greedy

What is the difference between dynamic programming and greedy approach? Non greedy (reluctant) regex matching in sed?

Examples related to regex-greedy

How to capture multiple repeated groups? How can I write a regex which matches non greedy? Regex credit card number tests What is the difference between .*? and .* regular expressions? How to do a non-greedy match in grep? How to make Regular expression into non-greedy? What do 'lazy' and 'greedy' mean in the context of regular expressions? How can I make my match non greedy in vim? Non greedy (reluctant) regex matching in sed? Python non-greedy regexes