Simple regex question. I have a string on the following format:
this is a [sample] string with [some] special words. [another one]
What is the regular expression to extract the words within the square brackets, ie.
sample
some
another one
Note: In my use case, brackets cannot be nested.
This question is related to
regex
if you want fillter only small alphabet letter between square bracket a-z
(\[[a-z]*\])
if you want small and caps letter a-zA-Z
(\[[a-zA-Z]*\])
if you want small caps and number letter a-zA-Z0-9
(\[[a-zA-Z0-9]*\])
if you want everything between square bracket
if you want text , number and symbols
(\[.*\])
To match a substring between the first [
and last ]
, you may use
\[.*\] # Including open/close brackets
\[(.*)\] # Excluding open/close brackets (using a capturing group)
(?<=\[).*(?=\]) # Excluding open/close brackets (using lookarounds)
See a regex demo and a regex demo #2.
Use the following expressions to match strings between the closest square brackets:
Including the brackets:
\[[^][]*]
- PCRE, Python re
/regex
, .NET, Golang, POSIX (grep, sed, bash)\[[^\][]*]
- ECMAScript (JavaScript, C++ std::regex
, VBA RegExp
)\[[^\]\[]*]
- Java regex\[[^\]\[]*\]
- Onigmo (Ruby, requires escaping of brackets everywhere)Excluding the brackets:
(?<=\[)[^][]*(?=])
- PCRE, Python re
/regex
, .NET (C#, etc.), ICU (R stringr
), JGSoft Software\[([^][]*)]
- Bash, Golang - capture the contents between the square brackets with a pair of unescaped parentheses, also see below\[([^\][]*)]
- JavaScript, C++ std::regex
, VBA RegExp
(?<=\[)[^\]\[]*(?=])
- Java regex(?<=\[)[^\]\[]*(?=\])
- Onigmo (Ruby, requires escaping of brackets everywhere)NOTE: *
matches 0 or more characters, use +
to match 1 or more to avoid empty string matches in the resulting list/array.
Whenever both lookaround support is available, the above solutions rely on them to exclude the leading/trailing open/close bracket. Otherwise, rely on capturing groups (links to most common solutions in some languages have been provided).
If you need to match nested parentheses, you may see the solutions in the Regular expression to match balanced parentheses thread and replace the round brackets with the square ones to get the necessary functionality. You should use capturing groups to access the contents with open/close bracket excluded:
\[((?:[^][]++|(?R))*)]
- PHP PCRE\[((?>[^][]+|(?<o>)\[|(?<-o>]))*)]
- .NET demo\[(?:[^\]\[]++|(\g<0>))*\]
- Onigmo (Ruby) demoIn R, try:
x <- 'foo[bar]baz'
str_replace(x, ".*?\\[(.*?)\\].*", "\\1")
[1] "bar"
(?<=\[).*?(?=\])
works good as per explanation given above. Here's a Python example:
import re
str = "Pagination.go('formPagination_bottom',2,'Page',true,'1',null,'2013')"
re.search('(?<=\[).*?(?=\])', str).group()
"'formPagination_bottom',2,'Page',true,'1',null,'2013'"
This should work out ok:
\[([^]]+)\]
Can brackets be nested?
If not: \[([^]]+)\]
matches one item, including square brackets. Backreference \1
will contain the item to be match. If your regex flavor supports lookaround, use
(?<=\[)[^]]+(?=\])
This will only match the item inside brackets.
The @Tim Pietzcker's answer here
(?<=\[)[^]]+(?=\])
is almost the one I've been looking for. But there is one issue that some legacy browsers can fail on positive lookbehind. So I had to made my day by myself :). I manged to write this:
/([^[]+(?=]))/g
Maybe it will help someone.
console.log("this is a [sample] string with [some] special words. [another one]".match(/([^[]+(?=]))/g));
_x000D_
If you do not want to include the brackets in the match, here's the regex: (?<=\[).*?(?=\])
The .
matches any character except for line terminators. The ?=
is a positive lookahead. A positive lookahead finds a string when a certain string comes after it. The ?<=
is a positive lookbehind. A positive lookbehind finds a string when a certain string precedes it. To quote this,
Look ahead positive (?=)
Find expression A where expression B follows:
A(?=B)
Look behind positive (?<=)
Find expression A where expression B precedes:
(?<=B)A
If your regex engine does not support lookaheads and lookbehinds, then you can use the regex \[(.*?)\]
to capture the innards of the brackets in a group and then you can manipulate the group as necessary.
The parentheses capture the characters in a group. The .*?
gets all of the characters between the brackets (except for line terminators, unless you have the s
flag enabled) in a way that is not greedy.
This code will extract the content between square brackets and parentheses
(?:(?<=\().+?(?=\))|(?<=\[).+?(?=\]))
(?: non capturing group
(?<=\().+?(?=\)) positive lookbehind and lookahead to extract the text between parentheses
| or
(?<=\[).+?(?=\]) positive lookbehind and lookahead to extract the text between square brackets
Just in case, you might have had unbalanced brackets, you can likely design some expression with recursion similar to,
\[(([^\]\[]+)|(?R))*+\]
which of course, it would relate to the language or RegEx engine that you might be using.
Other than that,
\[([^\]\[\r\n]*)\]
or,
(?<=\[)[^\]\[\r\n]*(?=\])
are good options to explore.
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
jex.im visualizes regular expressions:
const regex = /\[([^\]\[\r\n]*)\]/gm;_x000D_
const str = `This is a [sample] string with [some] special words. [another one]_x000D_
This is a [sample string with [some special words. [another one_x000D_
This is a [sample[sample]] string with [[some][some]] special words. [[another one]]`;_x000D_
let m;_x000D_
_x000D_
while ((m = regex.exec(str)) !== null) {_x000D_
// This is necessary to avoid infinite loops with zero-width matches_x000D_
if (m.index === regex.lastIndex) {_x000D_
regex.lastIndex++;_x000D_
}_x000D_
_x000D_
// The result can be accessed through the `m`-variable._x000D_
m.forEach((match, groupIndex) => {_x000D_
console.log(`Found match, group ${groupIndex}: ${match}`);_x000D_
});_x000D_
}
_x000D_
(?<=\[).+?(?=\])
Will capture content without brackets
(?<=\[)
- positive lookbehind for [
.*?
- non greedy match for the content
(?=\])
- positive lookahead for ]
EDIT: for nested brackets the below regex should work:
(\[(?:\[??[^\[]*?\]))
([[][a-z \s]+[]])
Above should work given the following explaination
characters within square brackets[] defines characte class which means pattern should match atleast one charcater mentioned within square brackets
\s specifies a space
+ means atleast one of the character mentioned previously to +.
I needed including newlines and including the brackets
\[[\s\S]+\]
If someone wants to match and select a string containing one or more dots inside square brackets like "[fu.bar]" use the following:
(?<=\[)(\w+\.\w+.*?)(?=\])
Source: Stackoverflow.com