[regex] Regular Expression for alphanumeric and underscores

I would like to have a regular expression that checks if a string contains only upper and lowercase letters, numbers, and underscores.

This question is related to regex

The answer is


use lookaheads to do the "at least one" stuff. Trust me it's much easier.

Here's an example that would require 1-10 characters, containing at least one digit and one letter:

^(?=.*\d)(?=.*[A-Za-z])[A-Za-z0-9]{1,10}$

NOTE: could have used \w but then ECMA/Unicode considerations come into play increasing the character coverage of the \w "word character".


Here is the regex for what you want with a quantifier to specify at least 1 character and no more than 255 characters

[^a-zA-Z0-9 _]{1,255}

I believe you are not taking Latin and Unicode characters in your matches. For example, if you need to take "ã" or "ü" chars, the use of "\w" won't work.

You can, alternatively, use this approach:

^[A-ZÀ-Ýa-zà-ý0-9_]+$

Hope it helps!


There's a lot of verbosity in here, and I'm deeply against it, so, my conclusive answer would be:

/^\w+$/

\w is equivalent to [A-Za-z0-9_], which is pretty much what you want. (unless we introduce unicode to the mix)

Using the + quantifier you'll match one or more characters. If you want to accept an empty string too, use * instead.


You want to check that each character matches your requirements, which is why we use:

[A-Za-z0-9_]

And you can even use the shorthand version:

\w

Which is equivalent (in some regex flavors, so make sure you check before you use it). Then to indicate that the entire string must match, you use:

^

To indicate the string must start with that character, then use

$

To indicate the string must end with that character. Then use

\w+ or \w*

To indicate "1 or more", or "0 or more". Putting it all together, we have:

^\w*$

To match a string that contains only those characters (or an empty string), try

"^[a-zA-Z0-9_]*$"

This works for .NET regular expressions, and probably a lot of other languages as well.

Breaking it down:

^ : start of string
[ : beginning of character group
a-z : any lowercase letter
A-Z : any uppercase letter
0-9 : any digit
_ : underscore
] : end of character group
* : zero or more of the given characters
$ : end of string

If you don't want to allow empty strings, use + instead of *.


As others have pointed out, some regex languages have a shorthand form for [a-zA-Z0-9_]. In the .NET regex language, you can turn on ECMAScript behavior and use \w as a shorthand (yielding ^\w*$ or ^\w+$). Note that in other languages, and by default in .NET, \w is somewhat broader, and will match other sorts of Unicode characters as well (thanks to Jan for pointing this out). So if you're really intending to match only those characters, using the explicit (longer) form is probably best.


Try these multi-lingual extensions I have made for string.

IsAlphaNumeric - String must contain atleast 1 alpha (letter in Unicode range, specified in charSet) and atleast 1 number (specified in numSet). Also, the string should comprise only of alpha and numbers.

IsAlpha - String should contain atleast 1 alpha (in the language charSet specified) and comprise only of alpha.

IsNumeric - String should contain atleast 1 number (in the language numSet specified) and comprise only of numbers.

The charSet/numSet range for the desired language can be specified. The Unicode ranges are available on below link:

http://www.ssec.wisc.edu/~tomw/java/unicode.html

API :

    public static bool IsAlphaNumeric(this string stringToTest)
    {
        //English
        const string charSet = "a-zA-Z";
        const string numSet = @"0-9";

        //Greek
        //const string charSet = @"\u0388-\u03EF";            
        //const string numSet = @"0-9";

        //Bengali
        //const string charSet = @"\u0985-\u09E3";
        //const string numSet = @"\u09E6-\u09EF";

        //Hindi
        //const string charSet = @"\u0905-\u0963";
        //const string numSet = @"\u0966-\u096F";

        return Regex.Match(stringToTest, @"^(?=[" + numSet + @"]*?[" + charSet + @"]+)(?=[" + charSet + @"]*?[" + numSet + @"]+)[" + charSet + numSet +@"]+$").Success;
    }

    public static bool IsNumeric(this string stringToTest)
    {
        //English
        const string numSet = @"0-9";

        //Hindi
        //const string numSet = @"\u0966-\u096F";

        return Regex.Match(stringToTest, @"^[" + numSet + @"]+$").Success;
    }

    public static bool IsAlpha(this string stringToTest)
    {
        //English
        const string charSet = "a-zA-Z";

        return Regex.Match(stringToTest, @"^[" + charSet + @"]+$").Success;
    }

Usage :

        //English
        string test = "AASD121asf";

        //Greek
        //string test = "??ß123";

        //Bengali
        //string test = "????";

        //Hindi
        //string test = @"??????";

        bool isAlphaNum = test.IsAlphaNumeric();

For me there was an issue in that I want to distinguish between alpha, numeric and alpha numeric, so to ensure an alphanumeric string contains at least one alpha and at least one numeric, I used :

^([a-zA-Z_]{1,}\d{1,})+|(\d{1,}[a-zA-Z_]{1,})+$

How about:

^([A-Za-z]|[0-9]|_)+$

...if you want to be explicit, or:

^\w+$

...if you prefer concise (Perl syntax).


use lookaheads to do the "at least one" stuff. Trust me it's much easier.

Here's an example that would require 1-10 characters, containing at least one digit and one letter:

^(?=.*\d)(?=.*[A-Za-z])[A-Za-z0-9]{1,10}$

NOTE: could have used \w but then ECMA/Unicode considerations come into play increasing the character coverage of the \w "word character".


How about:

^([A-Za-z]|[0-9]|_)+$

...if you want to be explicit, or:

^\w+$

...if you prefer concise (Perl syntax).


There's a lot of verbosity in here, and I'm deeply against it, so, my conclusive answer would be:

/^\w+$/

\w is equivalent to [A-Za-z0-9_], which is pretty much what you want. (unless we introduce unicode to the mix)

Using the + quantifier you'll match one or more characters. If you want to accept an empty string too, use * instead.


How about:

^([A-Za-z]|[0-9]|_)+$

...if you want to be explicit, or:

^\w+$

...if you prefer concise (Perl syntax).


Here is the regex for what you want with a quantifier to specify at least 1 character and no more than 255 characters

[^a-zA-Z0-9 _]{1,255}

For those of you looking for unicode alphanumeric matching, you might want to do something like:

^[\p{L} \p{Nd}_]+$

Further reading at http://unicode.org/reports/tr18/ and at http://www.regular-expressions.info/unicode.html


This works for me, found this in the O'Reilly's "Mastering Regular Expressions":

/^\w+$/

Explanation:

  • ^ asserts position at start of the string
    • \w+ matches any word character (equal to [a-zA-Z0-9_])
    • "+" Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
  • $ asserts position at the end of the string

Verify yourself:

_x000D_
_x000D_
const regex = /^\w+$/;_x000D_
const str = `nut_cracker_12`;_x000D_
let m;_x000D_
_x000D_
if ((m = regex.exec(str)) !== null) {_x000D_
    // The result can be accessed through the `m`-variable._x000D_
    m.forEach((match, groupIndex) => {_x000D_
        console.log(`Found match, group ${groupIndex}: ${match}`);_x000D_
    });_x000D_
}
_x000D_
_x000D_
_x000D_


Although it's more verbose than \w, I personally appreciate the readability of the full POSIX character class names ( http://www.zytrax.com/tech/web/regex.htm#special ), so I'd say:

^[[:alnum:]_]+$

However, while the documentation at the above links states that \w will "Match any character in the range 0 - 9, A - Z and a - z (equivalent of POSIX [:alnum:])", I have not found this to be true. Not with grep -P anyway. You need to explicitly include the underscore if you use [:alnum:] but not if you use \w. You can't beat the following for short and sweet:

^\w+$

Along with readability, using the POSIX character classes (http://www.regular-expressions.info/posixbrackets.html) means that your regex can work on non ASCII strings, which the range based regexes won't do since they rely on the underlying ordering of the ASCII characters which may be different from other character sets and will therefore exclude some non-ASCII characters (letters such as œ) which you might want to capture.


Um...question: Does it need to have at least one character or no? Can it be an empty string?

^[A-Za-z0-9_]+$

Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *

^[A-Za-z0-9_]*$

Edit:

If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:

^\w+$

Or

^\w*$

To match a string that contains only those characters (or an empty string), try

"^[a-zA-Z0-9_]*$"

This works for .NET regular expressions, and probably a lot of other languages as well.

Breaking it down:

^ : start of string
[ : beginning of character group
a-z : any lowercase letter
A-Z : any uppercase letter
0-9 : any digit
_ : underscore
] : end of character group
* : zero or more of the given characters
$ : end of string

If you don't want to allow empty strings, use + instead of *.


As others have pointed out, some regex languages have a shorthand form for [a-zA-Z0-9_]. In the .NET regex language, you can turn on ECMAScript behavior and use \w as a shorthand (yielding ^\w*$ or ^\w+$). Note that in other languages, and by default in .NET, \w is somewhat broader, and will match other sorts of Unicode characters as well (thanks to Jan for pointing this out). So if you're really intending to match only those characters, using the explicit (longer) form is probably best.


Um...question: Does it need to have at least one character or no? Can it be an empty string?

^[A-Za-z0-9_]+$

Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *

^[A-Za-z0-9_]*$

Edit:

If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:

^\w+$

Or

^\w*$

This should work in most of the cases.

/^[\d]*[a-z_][a-z\d_]*$/gi

And by most I mean,

abcd       True
abcd12     True
ab12cd     True
12abcd     True

1234       False


Explanation

  1. ^ ... $ - match the pattern starting and ending with
  2. [\d]* - match zero or more digits
  3. [a-z_] - match an alphabet or underscore
  4. [a-z\d_]* - match an alphabet or digit or underscore
  5. /gi - match globally across the string and case-insensitive

How about:

^([A-Za-z]|[0-9]|_)+$

...if you want to be explicit, or:

^\w+$

...if you prefer concise (Perl syntax).


To check the entire string and not allow empty strings, try

^[A-Za-z0-9_]+$

This works for me, found this in the O'Reilly's "Mastering Regular Expressions":

/^\w+$/

Explanation:

  • ^ asserts position at start of the string
    • \w+ matches any word character (equal to [a-zA-Z0-9_])
    • "+" Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
  • $ asserts position at the end of the string

Verify yourself:

_x000D_
_x000D_
const regex = /^\w+$/;_x000D_
const str = `nut_cracker_12`;_x000D_
let m;_x000D_
_x000D_
if ((m = regex.exec(str)) !== null) {_x000D_
    // The result can be accessed through the `m`-variable._x000D_
    m.forEach((match, groupIndex) => {_x000D_
        console.log(`Found match, group ${groupIndex}: ${match}`);_x000D_
    });_x000D_
}
_x000D_
_x000D_
_x000D_


I believe you are not taking Latin and Unicode characters in your matches. For example, if you need to take "ã" or "ü" chars, the use of "\w" won't work.

You can, alternatively, use this approach:

^[A-ZÀ-Ýa-zà-ý0-9_]+$

Hope it helps!


The following regex matches alphanumeric characters and underscore:

^[a-zA-Z0-9_]+$

For example, in Perl:

#!/usr/bin/perl -w

my $arg1 = $ARGV[0];

# check that the string contains *only* one or more alphanumeric chars or underscores
if ($arg1 !~ /^[a-zA-Z0-9_]+$/) {
  print "Failed.\n";
} else {
    print "Success.\n";
}

Here is the regex for what you want with a quantifier to specify at least 1 character and no more than 255 characters

[^a-zA-Z0-9 _]{1,255}

There's a lot of verbosity in here, and I'm deeply against it, so, my conclusive answer would be:

/^\w+$/

\w is equivalent to [A-Za-z0-9_], which is pretty much what you want. (unless we introduce unicode to the mix)

Using the + quantifier you'll match one or more characters. If you want to accept an empty string too, use * instead.


The following regex matches alphanumeric characters and underscore:

^[a-zA-Z0-9_]+$

For example, in Perl:

#!/usr/bin/perl -w

my $arg1 = $ARGV[0];

# check that the string contains *only* one or more alphanumeric chars or underscores
if ($arg1 !~ /^[a-zA-Z0-9_]+$/) {
  print "Failed.\n";
} else {
    print "Success.\n";
}

To check the entire string and not allow empty strings, try

^[A-Za-z0-9_]+$

Required Format Allow these 3:

  1. 0142171547295
  2. 014-2171547295
  3. 123abc

Don't Allow other formats:

validatePnrAndTicketNumber(){
    let alphaNumericRegex=/^[a-zA-Z0-9]*$/;
    let numericRegex=/^[0-9]*$/;
    let numericdashRegex=/^(([1-9]{3})\-?([0-9]{10}))$/;
   this.currBookingRefValue = this.requestForm.controls["bookingReference"].value;
   if(this.currBookingRefValue.length == 14 && this.currBookingRefValue.match(numericdashRegex)){
     this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
   }else if(this.currBookingRefValue.length ==6 && this.currBookingRefValue.match(alphaNumericRegex)){
    this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
   }else if(this.currBookingRefValue.length ==13 && this.currBookingRefValue.match(numericRegex) ){
    this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
   }else{
    this.requestForm.controls["bookingReference"].setErrors({'pattern': true});
   }
}
<input name="booking_reference" type="text" [class.input-not-empty]="bookingRef.value"
    class="glyph-input form-control floating-label-input" id="bookings_bookingReference"
    value="" maxlength="14" aria-required="true" role="textbox" #bookingRef
    formControlName="bookingReference" (focus)="resetMessageField()" (blur)="validatePnrAndTicketNumber()"/>

Um...question: Does it need to have at least one character or no? Can it be an empty string?

^[A-Za-z0-9_]+$

Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *

^[A-Za-z0-9_]*$

Edit:

If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:

^\w+$

Or

^\w*$

this works for me you can try

[\\p{Alnum}_]

To check the entire string and not allow empty strings, try

^[A-Za-z0-9_]+$

There's a lot of verbosity in here, and I'm deeply against it, so, my conclusive answer would be:

/^\w+$/

\w is equivalent to [A-Za-z0-9_], which is pretty much what you want. (unless we introduce unicode to the mix)

Using the + quantifier you'll match one or more characters. If you want to accept an empty string too, use * instead.


For me there was an issue in that I want to distinguish between alpha, numeric and alpha numeric, so to ensure an alphanumeric string contains at least one alpha and at least one numeric, I used :

^([a-zA-Z_]{1,}\d{1,})+|(\d{1,}[a-zA-Z_]{1,})+$

For those of you looking for unicode alphanumeric matching, you might want to do something like:

^[\p{L} \p{Nd}_]+$

Further reading at http://unicode.org/reports/tr18/ and at http://www.regular-expressions.info/unicode.html


this works for me you can try

[\\p{Alnum}_]

Although it's more verbose than \w, I personally appreciate the readability of the full POSIX character class names ( http://www.zytrax.com/tech/web/regex.htm#special ), so I'd say:

^[[:alnum:]_]+$

However, while the documentation at the above links states that \w will "Match any character in the range 0 - 9, A - Z and a - z (equivalent of POSIX [:alnum:])", I have not found this to be true. Not with grep -P anyway. You need to explicitly include the underscore if you use [:alnum:] but not if you use \w. You can't beat the following for short and sweet:

^\w+$

Along with readability, using the POSIX character classes (http://www.regular-expressions.info/posixbrackets.html) means that your regex can work on non ASCII strings, which the range based regexes won't do since they rely on the underlying ordering of the ASCII characters which may be different from other character sets and will therefore exclude some non-ASCII characters (letters such as œ) which you might want to capture.


Here is the regex for what you want with a quantifier to specify at least 1 character and no more than 255 characters

[^a-zA-Z0-9 _]{1,255}

This should work in most of the cases.

/^[\d]*[a-z_][a-z\d_]*$/gi

And by most I mean,

abcd       True
abcd12     True
ab12cd     True
12abcd     True

1234       False


Explanation

  1. ^ ... $ - match the pattern starting and ending with
  2. [\d]* - match zero or more digits
  3. [a-z_] - match an alphabet or underscore
  4. [a-z\d_]* - match an alphabet or digit or underscore
  5. /gi - match globally across the string and case-insensitive

You want to check that each character matches your requirements, which is why we use:

[A-Za-z0-9_]

And you can even use the shorthand version:

\w

Which is equivalent (in some regex flavors, so make sure you check before you use it). Then to indicate that the entire string must match, you use:

^

To indicate the string must start with that character, then use

$

To indicate the string must end with that character. Then use

\w+ or \w*

To indicate "1 or more", or "0 or more". Putting it all together, we have:

^\w*$

^\w*$ will work for below combinations

1
123
1av
pRo
av1

Try these multi-lingual extensions I have made for string.

IsAlphaNumeric - String must contain atleast 1 alpha (letter in Unicode range, specified in charSet) and atleast 1 number (specified in numSet). Also, the string should comprise only of alpha and numbers.

IsAlpha - String should contain atleast 1 alpha (in the language charSet specified) and comprise only of alpha.

IsNumeric - String should contain atleast 1 number (in the language numSet specified) and comprise only of numbers.

The charSet/numSet range for the desired language can be specified. The Unicode ranges are available on below link:

http://www.ssec.wisc.edu/~tomw/java/unicode.html

API :

    public static bool IsAlphaNumeric(this string stringToTest)
    {
        //English
        const string charSet = "a-zA-Z";
        const string numSet = @"0-9";

        //Greek
        //const string charSet = @"\u0388-\u03EF";            
        //const string numSet = @"0-9";

        //Bengali
        //const string charSet = @"\u0985-\u09E3";
        //const string numSet = @"\u09E6-\u09EF";

        //Hindi
        //const string charSet = @"\u0905-\u0963";
        //const string numSet = @"\u0966-\u096F";

        return Regex.Match(stringToTest, @"^(?=[" + numSet + @"]*?[" + charSet + @"]+)(?=[" + charSet + @"]*?[" + numSet + @"]+)[" + charSet + numSet +@"]+$").Success;
    }

    public static bool IsNumeric(this string stringToTest)
    {
        //English
        const string numSet = @"0-9";

        //Hindi
        //const string numSet = @"\u0966-\u096F";

        return Regex.Match(stringToTest, @"^[" + numSet + @"]+$").Success;
    }

    public static bool IsAlpha(this string stringToTest)
    {
        //English
        const string charSet = "a-zA-Z";

        return Regex.Match(stringToTest, @"^[" + charSet + @"]+$").Success;
    }

Usage :

        //English
        string test = "AASD121asf";

        //Greek
        //string test = "??ß123";

        //Bengali
        //string test = "????";

        //Hindi
        //string test = @"??????";

        bool isAlphaNum = test.IsAlphaNumeric();

To check the entire string and not allow empty strings, try

^[A-Za-z0-9_]+$

You want to check that each character matches your requirements, which is why we use:

[A-Za-z0-9_]

And you can even use the shorthand version:

\w

Which is equivalent (in some regex flavors, so make sure you check before you use it). Then to indicate that the entire string must match, you use:

^

To indicate the string must start with that character, then use

$

To indicate the string must end with that character. Then use

\w+ or \w*

To indicate "1 or more", or "0 or more". Putting it all together, we have:

^\w*$

Um...question: Does it need to have at least one character or no? Can it be an empty string?

^[A-Za-z0-9_]+$

Will do at least one upper or lower case alphanumeric or underscore. If it can be zero length, then just substitute the + for *

^[A-Za-z0-9_]*$

Edit:

If diacritics need to be included (such as cedilla - ç) then you would need to use the word character which does the same as the above, but includes the diacritic characters:

^\w+$

Or

^\w*$

^\w*$ will work for below combinations

1
123
1av
pRo
av1

To match a string that contains only those characters (or an empty string), try

"^[a-zA-Z0-9_]*$"

This works for .NET regular expressions, and probably a lot of other languages as well.

Breaking it down:

^ : start of string
[ : beginning of character group
a-z : any lowercase letter
A-Z : any uppercase letter
0-9 : any digit
_ : underscore
] : end of character group
* : zero or more of the given characters
$ : end of string

If you don't want to allow empty strings, use + instead of *.


As others have pointed out, some regex languages have a shorthand form for [a-zA-Z0-9_]. In the .NET regex language, you can turn on ECMAScript behavior and use \w as a shorthand (yielding ^\w*$ or ^\w+$). Note that in other languages, and by default in .NET, \w is somewhat broader, and will match other sorts of Unicode characters as well (thanks to Jan for pointing this out). So if you're really intending to match only those characters, using the explicit (longer) form is probably best.


In Computer Science, an Alphanumeric value often means the first character is not a number but is an alphabet or underscore. Thereafter the character can be 0-9, A-Z, a-z, or underscore (_).

Here is how you would do that:

Tested under php:

$regex = '/^[A-Za-z_][A-Za-z\d_]*$/'

or take this

^[A-Za-z_][A-Za-z\d_]*$

and place it in your development language.


Required Format Allow these 3:

  1. 0142171547295
  2. 014-2171547295
  3. 123abc

Don't Allow other formats:

validatePnrAndTicketNumber(){
    let alphaNumericRegex=/^[a-zA-Z0-9]*$/;
    let numericRegex=/^[0-9]*$/;
    let numericdashRegex=/^(([1-9]{3})\-?([0-9]{10}))$/;
   this.currBookingRefValue = this.requestForm.controls["bookingReference"].value;
   if(this.currBookingRefValue.length == 14 && this.currBookingRefValue.match(numericdashRegex)){
     this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
   }else if(this.currBookingRefValue.length ==6 && this.currBookingRefValue.match(alphaNumericRegex)){
    this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
   }else if(this.currBookingRefValue.length ==13 && this.currBookingRefValue.match(numericRegex) ){
    this.requestForm.controls["bookingReference"].setErrors({'pattern': false});
   }else{
    this.requestForm.controls["bookingReference"].setErrors({'pattern': true});
   }
}
<input name="booking_reference" type="text" [class.input-not-empty]="bookingRef.value"
    class="glyph-input form-control floating-label-input" id="bookings_bookingReference"
    value="" maxlength="14" aria-required="true" role="textbox" #bookingRef
    formControlName="bookingReference" (focus)="resetMessageField()" (blur)="validatePnrAndTicketNumber()"/>