[regex] Matching an optional substring in a regex

I'm developing an algorithm to parse a number out of a series of short-ish strings. These strings are somewhat regular, but there's a few different general forms and several exceptions. I'm trying to build a set of regexes that will handle the various forms and exceptions; I'll apply them one after another to see if I get a match.

One of these forms goes something like this:

X (Y) Z

Where:

  • X is a number I want to capture.
  • Z is static, pre-defined text. it's basically how I determine whether this particular form is applicable or not.
  • Y is a a string of unknown length and content, surrounded by parenthesis.

Also: Y is optional; it doesn't always appear in a string with Z and X. So, I want to be able to extract the numbers from all of these strings:

  • 10 Z
  • 20 (foo) Z
  • 30 (bar) Z

Right now, I have a regex that will capture the first one:

([0-9]+) +Z

My problem is that I don't know how to construct a regex that will match a series of characters if and only if they're enclosed in parenthesis. Can this be done in a single regex?

This question is related to regex

The answer is


Try this:

X (\(Y\))? Z

This ought to work:

^\d+\s?(\([^\)]+\)\s?)?Z$

Haven't tested it though, but let me give you the breakdown, so if there are any bugs left they should be pretty straightforward to find:

First the beginning:

^ = beginning of string
\d+ = one or more decimal characters
\s? = one optional whitespace

Then this part:

(\([^\)]+\)\s?)?

Is actually:

(.............)?

Which makes the following contents optional, only if it exists fully

\([^\)]+\)\s?

\( = an opening bracket
[^\)]+ = a series of at least one character that is not a closing bracket
\) = followed by a closing bracket
\s? = followed by one optional whitespace

And the end is made up of

Z$

Where

Z = your constant string
$ = the end of the string

You can do this:

([0-9]+) (\([^)]+\))? Z

This will not work with nested parens for Y, however. Nesting requires recursion which isn't strictly regular any more (but context-free). Modern regexp engines can still handle it, albeit with some difficulties (back-references).


Try this:

X (\(Y\))? Z

You can do this:

([0-9]+) (\([^)]+\))? Z

This will not work with nested parens for Y, however. Nesting requires recursion which isn't strictly regular any more (but context-free). Modern regexp engines can still handle it, albeit with some difficulties (back-references).


Try this:

X (\(Y\))? Z

You can do this:

([0-9]+) (\([^)]+\))? Z

This will not work with nested parens for Y, however. Nesting requires recursion which isn't strictly regular any more (but context-free). Modern regexp engines can still handle it, albeit with some difficulties (back-references).


You can do this:

([0-9]+) (\([^)]+\))? Z

This will not work with nested parens for Y, however. Nesting requires recursion which isn't strictly regular any more (but context-free). Modern regexp engines can still handle it, albeit with some difficulties (back-references).


This ought to work:

^\d+\s?(\([^\)]+\)\s?)?Z$

Haven't tested it though, but let me give you the breakdown, so if there are any bugs left they should be pretty straightforward to find:

First the beginning:

^ = beginning of string
\d+ = one or more decimal characters
\s? = one optional whitespace

Then this part:

(\([^\)]+\)\s?)?

Is actually:

(.............)?

Which makes the following contents optional, only if it exists fully

\([^\)]+\)\s?

\( = an opening bracket
[^\)]+ = a series of at least one character that is not a closing bracket
\) = followed by a closing bracket
\s? = followed by one optional whitespace

And the end is made up of

Z$

Where

Z = your constant string
$ = the end of the string