Why doesn t 01-12 range work as expected

Question

I m trying to use the range pattern  01-12  in regex to match two digit mm  but this doesn t work as expected

User · Accepted Answer

You seem to have misunderstood how character classes definition works in regex.

To match any of the strings 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, or 12, something like this works:

0[1-9]|1[0-2]

References

regular-expressions.info/Character Classes
- Numeric Ranges (have many examples on matching strings interpreted as numeric ranges)

Explanation

A character class, by itself, attempts to match one and exactly one character from the input string. [01-12] actually defines [012], a character class that matches one character from the input against any of the 3 characters 0, 1, or 2.

The - range definition goes from 1 to 1, which includes just 1. On the other hand, something like [1-9] includes 1, 2, 3, 4, 5, 6, 7, 8, 9.

Beginners often make the mistakes of defining things like [this|that]. This doesn't "work". This character definition defines [this|a], i.e. it matches one character from the input against any of 6 characters in t, h, i, s, | or a. More than likely (this|that) is what is intended.

References

regular-expressions.info/Brackets for Grouping and Alternation with the vertical bar

How ranges are defined

So it's obvious now that a pattern like between [24-48] hours doesn't "work". The character class in this case is equivalent to [248].

That is, - in a character class definition doesn't define numeric range in the pattern. Regex engines doesn't really "understand" numbers in the pattern, with the exception of finite repetition syntax (e.g. a{3,5} matches between 3 and 5 a).

Range definition instead uses ASCII/Unicode encoding of the characters to define ranges. The character 0 is encoded in ASCII as decimal 48; 9 is 57. Thus, the character definition [0-9] includes all character whose values are between decimal 48 and 57 in the encoding. Rather sensibly, by design these are the characters 0, 1, ..., 9.

Another example: A to Z

Let's take a look at another common character class definition [a-zA-Z]

In ASCII:

A = 65, Z = 90
a = 97, z = 122

This means that:

[a-zA-Z] and [A-Za-z] are equivalent
In most flavors, [a-Z] is likely to be an illegal character range
- because a (97) is "greater than" than Z (90)
[A-z] is legal, but also includes these six characters:
- [ (91), \ (92), ] (93), ^ (94), _ (95), ` (96)

Related questions

is the regex [a-Z] valid and if yes then is it the same as [a-zA-Z]

User · Answer

A character class in regular expressions  denoted by the       syntax  specifies the rules to match a single character in the input  As such  everything you write between the brackets specify how to match a single character   Your pattern   01-12  is thus broken down as follows    0 - match the single digit 0 or  1-1  match a single digit in the range of 1 through 1 or  2  match a single digit 2   So basically all you re matching is 0  1 or 2   In order to do the matching you want  matching two digits  ranging from 01-12 as numbers  you need to think about how they will look as text   You have    01-09  ie  first digit is 0  second digit is 1-9  10-12  ie  first digit is 1  second digit is 0-2    You will then have to write a regular expression for that  which can look like this      -- a 0 followed by 1-9               -- a 1 followed by 0-2             lt - -- gt   lt - -- gt  0 1-9  1 0-2                         -- vertical bar  this roughly means  OR  in this context   Note that trying to combine them in order to get a shorter expression will fail  by giving false positive matches for invalid input   For instance  the pattern  0-1  0-9  would basically match the numbers 00-19  which is a bit more than what you want   I tried finding a definite source for more information about character classes  but for now all I can give you is this Google Query for Regex Character Classes  Hopefully you ll be able to find some more information there to help you

User · Answer

As polygenelubricants says yours would look for 0 1-1 2 rather than what you wish for  due to the fact that character classes  things in     match characters rather than strings

User · Answer

This also works      1-9   0-1  0-2      1-9  matches single digits between 1 and 9   0-1  0-2  matches double digits between 10 and 12  There are some good examples here

User · Answer

Use this   0  1-9  1 012     07  valid 7  valid 0  not match 00   not match 13   not match 21   not match   To test a pattern as 07 2018 use this      0  1-9  1 012      2-9  0-9  3        Date range between 01 2000 to 12 9999

User · Answer

The   s in a regex denote a character class   If no ranges are specified  it implicitly ors every character within it together   Thus   abcde  is the same as  a b c d e   except that it doesn t capture anything  it will match any one of a  b  c  d  or e   All a range indicates is a set of characters   ac-eg  says  match any one of  a  any character between c and e  or g    Thus  your match says  match any one of  0  any character between 1 and 1  i e   just 1   or 2   Your goal is evidently to specify a number range  any number between 01 and 12 written with two digits   In this specific case  you can match it with 0 1-9  1 0-2   either a 0 followed by any digit between 1 and 9  or a 1 followed by any digit between 0 and 2   In general  you can transform any number range into a valid regex in a similar manner   There may be a better option than regular expressions  however  or an existing function or module which can construct the regex for you   It depends on your language

[regex] Why doesn't [01-12] range work as expected?

References

Explanation

References

How ranges are defined

See also

Another example: A to Z

Related questions

Examples related to regex