[regex] Regular expression - starting and ending with a letter, accepting only letters, numbers and _

I'm trying to write a regular expression which specifies that text should start with a letter, every character should be a letter, number or underscore, there should not be 2 underscores in a row and it should end with a letter or number. At the moment, the only thing I have is ^[a-zA-Z]\w[a-zA-Z1-9_] but this doesn't seem to work properly since it only ever matches 3 characters, and allows repeated underscores. I also don't know how to specify requirements for the last character.

This question is related to regex

The answer is


I'll take a stab at it:

/^[a-z](?:_?[a-z0-9]+)*$/i

Explained:

/
 ^           # match beginning of string
 [a-z]       # match a letter for the first char
 (?:         # start non-capture group
   _?          # match 0 or 1 '_'
   [a-z0-9]+   # match a letter or number, 1 or more times
 )*          # end non-capture group, match whole group 0 or more times
 $           # match end of string
/i           # case insensitive flag

The non-capture group takes care of a) not allowing two _'s (it forces at least one letter or number per group) and b) only allowing the last char to be a letter or number.

Some test strings:

"a": match
"_": fail
"zz": match
"a0": match
"A_": fail
"a0_b": match
"a__b": fail
"a_1_c": match

Here's a solution using a negative lookahead (not supported in all regex engines):

^[a-zA-Z](((?!__)[a-zA-Z0-9_])*[a-zA-Z0-9])?$

Test that it works as expected:

import re
tests = [
   ('a', True),
   ('_', False),
   ('zz', True),
   ('a0', True),
   ('A_', False),
   ('a0_b', True),
   ('a__b', False),
   ('a_1_c', True),
]

regex = '^[a-zA-Z](((?!__)[a-zA-Z0-9_])*[a-zA-Z0-9])?$'
for test in tests:
   is_match = re.match(regex, test[0]) is not None
   if is_match != test[1]:
       print "fail: "  + test[0]

seeing how the rules are fairly complicated, I'd suggest the following:

/^[a-z](\w*)[a-z0-9]$/i

match the whole string and capture intermediate characters. Then either with the string functions or the following regex:

/__/

check if the captured part has two underscores in a row. For example in Python it would look like this:

>>> import re
>>> def valid(s):
    match = re.match(r'^[a-z](\w*)[a-z0-9]$', s, re.I)
    if match is not None:
        return match.group(1).count('__') == 0
    return False