[regex] Regex matching in a Bash if statement

What did I do wrong here?

Trying to match any string that contains spaces, lowercase, uppercase, or numbers. Special characters would be nice too, but I think that requires escaping certain characters.

TEST="THIS is a TEST title with some numbers 12345 and special char *&^%$#"

if [[ "$TEST" =~ [^a-zA-Z0-9\ ] ]]; then BLAH; fi

This obviously only tests for upper, lower, numbers, and spaces. Doesn't work though.

* UPDATE *

I guess I should have been more specific. Here is the actual real line of code.

if [[ "$TITLE" =~ [^a-zA-Z0-9\ ] ]]; then RETURN="FAIL" && ERROR="ERROR: Title can only contain upper and lowercase letters, numbers, and spaces!"; fi

* UPDATE *

./anm.sh: line 265: syntax error in conditional expression
./anm.sh: line 265: syntax error near `&*#]'
./anm.sh: line 265: `  if [[ ! "$TITLE" =~ [a-zA-Z0-9 $%^\&*#] ]]; then RETURN="FAIL" && ERROR="ERROR: Title can only contain upper and lowercase letters, numbers, and spaces!"; return; fi'

This question is related to regex bash if-statement

The answer is


There are a couple of important things to know about bash's [[ ]] construction. The first:

Word splitting and pathname expansion are not performed on the words between the [[ and ]]; tilde expansion, parameter and variable expansion, arithmetic expansion, command substitution, process substitution, and quote removal are performed.

The second thing:

An additional binary operator, ‘=~’, is available,... the string to the right of the operator is considered an extended regular expression and matched accordingly... Any part of the pattern may be quoted to force it to be matched as a string.

Consequently, $v on either side of the =~ will be expanded to the value of that variable, but the result will not be word-split or pathname-expanded. In other words, it's perfectly safe to leave variable expansions unquoted on the left-hand side, but you need to know that variable expansions will happen on the right-hand side.

So if you write: [[ $x =~ [$0-9a-zA-Z] ]], the $0 inside the regex on the right will be expanded before the regex is interpreted, which will probably cause the regex to fail to compile (unless the expansion of $0 ends with a digit or punctuation symbol whose ascii value is less than a digit). If you quote the right-hand side like-so [[ $x =~ "[$0-9a-zA-Z]" ]], then the right-hand side will be treated as an ordinary string, not a regex (and $0 will still be expanded). What you really want in this case is [[ $x =~ [\$0-9a-zA-Z] ]]

Similarly, the expression between the [[ and ]] is split into words before the regex is interpreted. So spaces in the regex need to be escaped or quoted. If you wanted to match letters, digits or spaces you could use: [[ $x =~ [0-9a-zA-Z\ ] ]]. Other characters similarly need to be escaped, like #, which would start a comment if not quoted. Of course, you can put the pattern into a variable:

pat="[0-9a-zA-Z ]"
if [[ $x =~ $pat ]]; then ...

For regexes which contain lots of characters which would need to be escaped or quoted to pass through bash's lexer, many people prefer this style. But beware: In this case, you cannot quote the variable expansion:

# This doesn't work:
if [[ $x =~ "$pat" ]]; then ...

Finally, I think what you are trying to do is to verify that the variable only contains valid characters. The easiest way to do this check is to make sure that it does not contain an invalid character. In other words, an expression like this:

valid='0-9a-zA-Z $%&#' # add almost whatever else you want to allow to the list
if [[ ! $x =~ [^$valid] ]]; then ...

! negates the test, turning it into a "does not match" operator, and a [^...] regex character class means "any character other than ...".

The combination of parameter expansion and regex operators can make bash regular expression syntax "almost readable", but there are still some gotchas. (Aren't there always?) One is that you could not put ] into $valid, even if $valid were quoted, except at the very beginning. (That's a Posix regex rule: if you want to include ] in a character class, it needs to go at the beginning. - can go at the beginning or the end, so if you need both ] and -, you need to start with ] and end with -, leading to the regex "I know what I'm doing" emoticon: [][-])


In case someone wanted an example using variables...

#!/bin/bash

# Only continue for 'develop' or 'release/*' branches
BRANCH_REGEX="^(develop$|release//*)"

if [[ $BRANCH =~ $BRANCH_REGEX ]];
then
    echo "BRANCH '$BRANCH' matches BRANCH_REGEX '$BRANCH_REGEX'"
else
    echo "BRANCH '$BRANCH' DOES NOT MATCH BRANCH_REGEX '$BRANCH_REGEX'"
fi

I'd prefer to use [:punct:] for that. Also, a-zA-Z09-9 could be just [:alnum:]:

[[ $TEST =~ ^[[:alnum:][:blank:][:punct:]]+$ ]]

Or you might be looking at this question because you happened to make a silly typo like I did and have the =~ reversed to ~=


Examples related to regex

Why my regexp for hyphenated words doesn't work? grep's at sign caught as whitespace Preg_match backtrack error regex match any single character (one character only) re.sub erroring with "Expected string or bytes-like object" Only numbers. Input number in React Visual Studio Code Search and Replace with Regular Expressions Strip / trim all strings of a dataframe return string with first match Regex How to capture multiple repeated groups?

Examples related to bash

Comparing a variable with a string python not working when redirecting from bash script Zipping a file in bash fails How do I prevent Conda from activating the base environment by default? Get first line of a shell command's output Fixing a systemd service 203/EXEC failure (no such file or directory) /bin/sh: apt-get: not found VSCode Change Default Terminal Run bash command on jenkins pipeline How to check if the docker engine and a docker container are running? How to switch Python versions in Terminal?

Examples related to if-statement

How to use *ngIf else? SQL Server IF EXISTS THEN 1 ELSE 2 What is a good practice to check if an environmental variable exists or not? Using OR operator in a jquery if statement R multiple conditions in if statement Syntax for an If statement using a boolean How to have multiple conditions for one if statement in python Ifelse statement in R with multiple conditions If strings starts with in PowerShell Multiple conditions in an IF statement in Excel VBA