[python] How can I remove text within parentheses with a regex?

I'm trying to handle a bunch of files, and I need to alter then to remove extraneous information in the filenames; notably, I'm trying to remove text inside parentheses. For example:

filename = "Example_file_(extra_descriptor).ext"

and I want to regex a whole bunch of files where the parenthetical expression might be in the middle or at the end, and of variable length.

What would the regex look like? Perl or Python syntax would be preferred.

This question is related to python regex perl

The answer is


>>> import re
>>> filename = "Example_file_(extra_descriptor).ext"
>>> p = re.compile(r'\([^)]*\)')
>>> re.sub(p, '', filename)
'Example_file_.ext'

If you can stand to use sed (possibly execute from within your program, it'd be as simple as:

sed 's/(.*)//g'

I would use:

\([^)]*\)

Java code:

Pattern pattern1 = Pattern.compile("(\\_\\(.*?\\))");
System.out.println(fileName.replace(matcher1.group(1), ""));

If a path may contain parentheses then the r'\(.*?\)' regex is not enough:

import os, re

def remove_parenthesized_chunks(path, safeext=True, safedir=True):
    dirpath, basename = os.path.split(path) if safedir else ('', path)
    name, ext = os.path.splitext(basename) if safeext else (basename, '')
    name = re.sub(r'\(.*?\)', '', name)
    return os.path.join(dirpath, name+ext)

By default the function preserves parenthesized chunks in directory and extention parts of the path.

Example:

>>> f = remove_parenthesized_chunks
>>> f("Example_file_(extra_descriptor).ext")
'Example_file_.ext'
>>> path = r"c:\dir_(important)\example(extra).ext(untouchable)"
>>> f(path)
'c:\\dir_(important)\\example.ext(untouchable)'
>>> f(path, safeext=False)
'c:\\dir_(important)\\example.ext'
>>> f(path, safedir=False)
'c:\\dir_\\example.ext(untouchable)'
>>> f(path, False, False)
'c:\\dir_\\example.ext'
>>> f(r"c:\(extra)\example(extra).ext", safedir=False)
'c:\\\\example.ext'

For those who want to use Python, here's a simple routine that removes parenthesized substrings, including those with nested parentheses. Okay, it's not a regex, but it'll do the job!

def remove_nested_parens(input_str):
    """Returns a copy of 'input_str' with any parenthesized text removed. Nested parentheses are handled."""
    result = ''
    paren_level = 0
    for ch in input_str:
        if ch == '(':
            paren_level += 1
        elif (ch == ')') and paren_level:
            paren_level -= 1
        elif not paren_level:
            result += ch
    return result

remove_nested_parens('example_(extra(qualifier)_text)_test(more_parens).ext')

If you don't absolutely need to use a regex, useconsider using Perl's Text::Balanced to remove the parenthesis.

use Text::Balanced qw(extract_bracketed);

my ($extracted, $remainder, $prefix) = extract_bracketed( $filename, '()', '[^(]*' );

{   no warnings 'uninitialized';

    $filename = (defined $prefix or defined $remainder)
                ? $prefix . $remainder
                : $extracted;
}

You may be thinking, "Why do all this when a regex does the trick in one line?"

$filename =~ s/\([^}]*\)//;

Text::Balanced handles nested parenthesis. So $filename = 'foo_(bar(baz)buz)).foo' will be extracted properly. The regex based solutions offered here will fail on this string. The one will stop at the first closing paren, and the other will eat them all.

   $filename =~ s/\([^}]*\)//;
   # returns 'foo_buz)).foo'

   $filename =~ s/\(.*\)//;
   # returns 'foo_.foo'

   # text balanced example returns 'foo_).foo'

If either of the regex behaviors is acceptable, use a regex--but document the limitations and the assumptions being made.


The pattern that matches substrings in parentheses having no other ( and ) characters in between (like (xyz 123) in Text (abc(xyz 123)) is

\([^()]*\)

Details:

  • \( - an opening round bracket (note that in POSIX BRE, ( should be used, see sed example below)
  • [^()]* - zero or more (due to the * Kleene star quantifier) characters other than those defined in the negated character class/POSIX bracket expression, that is, any chars other than ( and )
  • \) - a closing round bracket (no escaping in POSIX BRE allowed)

Removing code snippets:

  • JavaScript: string.replace(/\([^()]*\)/g, '')
  • PHP: preg_replace('~\([^()]*\)~', '', $string)
  • Perl: $s =~ s/\([^()]*\)//g
  • Python: re.sub(r'\([^()]*\)', '', s)
  • C#: Regex.Replace(str, @"\([^()]*\)", string.Empty)
  • VB.NET: Regex.Replace(str, "\([^()]*\)", "")
  • Java: s.replaceAll("\\([^()]*\\)", "")
  • Ruby: s.gsub(/\([^()]*\)/, '')
  • R: gsub("\\([^()]*\\)", "", x)
  • Lua: string.gsub(s, "%([^()]*%)", "")
  • Bash/sed: sed 's/([^()]*)//g'
  • Tcl: regsub -all {\([^()]*\)} $s "" result
  • C++ std::regex: std::regex_replace(s, std::regex(R"(\([^()]*\))"), "")
  • Objective-C:
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"\\([^()]*\\)" options:NSRegularExpressionCaseInsensitive error:&error]; NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:@""];
  • Swift: s.replacingOccurrences(of: "\\([^()]*\\)", with: "", options: [.regularExpression])

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to perl

The program can't start because api-ms-win-crt-runtime-l1-1-0.dll is missing while starting Apache server on my computer "End of script output before headers" error in Apache Perl - Multiple condition if statement without duplicating code? How to decrypt hash stored by bcrypt Split a string into array in Perl Turning multiple lines into one comma separated line String compare in Perl with "eq" vs "==" how to remove the first two columns in a file using shell (awk, sed, whatever) Find everything between two XML tags with RegEx Difference between \w and \b regular expression meta characters