[regex] Regex to match only letters

How can I write a regex that matches only letters?

This question is related to regex

The answer is


\p{L} matches anything that is a Unicode letter if you're interested in alphabets beyond the Latin one


/[a-zA-Z]+/

Super simple example. Regular expressions are extremely easy to find online.

http://www.regular-expressions.info/reference.html


Use a character set: [a-zA-Z] matches one letter from A–Z in lowercase and uppercase. [a-zA-Z]+ matches one or more letters and ^[a-zA-Z]+$ matches only strings that consist of one or more letters only (^ and $ mark the begin and end of a string respectively).

If you want to match other letters than A–Z, you can either add them to the character set: [a-zA-ZäöüßÄÖÜ]. Or you use predefined character classes like the Unicode character property class \p{L} that describes the Unicode characters that are letters.


In python, I have found the following to work:

[^\W\d_]

This works because we are creating a new character class (the []) which excludes (^) any character from the class \W (everything NOT in [a-zA-Z0-9_]), also excludes any digit (\d) and also excludes the underscore (_).

That is, we have taken the character class [a-zA-Z0-9_] and removed the 0-9 and _ bits. You might ask, wouldn't it just be easier to write [a-zA-Z] then, instead of [^\W\d_]? You would be correct if dealing only with ASCII text, but when dealing with unicode text:

\W

Matches any character which is not a word character. This is the opposite of \w. > If the ASCII flag is used this becomes the equivalent of [^a-zA-Z0-9_].

^ from the python re module documentation

That is, we are taking everything considered to be a word character in unicode, removing everything considered to be a digit character in unicode, and also removing the underscore.

For example, the following code snippet

import re
regex = "[^\W\d_]"
test_string = "A;,./>>?()*)&^*&^%&^#Bsfa1 203974"
re.findall(regex, test_string)

Returns

['A', 'B', 's', 'f', 'a']

You would use

/[a-z]/gi

[]--checks for any characters between given inputs

a-z---covers the entire alphabet

g-----globally throughout the whole string

i-----getting upper and lowercase


Depending on your meaning of "character":

[A-Za-z] - all letters (uppercase and lowercase)

[^0-9] - all non-digit characters


You can try this regular expression : [^\W\d_] or [a-zA-Z].


/^[A-z]+$/.test('asd')
// true

/^[A-z]+$/.test('asd0')
// false

/^[A-z]+$/.test('0asd')
// false

If you mean any letters in any character encoding, then a good approach might be to delete non-letters like spaces \s, digits \d, and other special characters like:

[!@#\$%\^&\*\(\)\[\]:;'",\. ...more special chars... ]

Or use negation of above negation to directly describe any letters:

\S \D and [^  ..special chars..]

Pros:

  • Works with all regex flavors.
  • Easy to write, sometimes save lots of time.

Cons:

  • Long, sometimes not perfect, but character encoding can be broken as well.

Lately I have used this pattern in my forms to check names of people, containing letters, blanks and special characters like accent marks.

pattern="[A-zÀ-ú\s]+"

For PHP, following will work fine

'/^[a-zA-Z]+$/'

So, I've been reading a lot of the answers, and most of them don't take exceptions into account, like letters with accents or diaeresis (á, à, ä, etc.).

I made a function in typescript that should be pretty much extrapolable to any language that can use RegExp. This is my personal implementation for my use case in typescript. What I basically did is add ranges of letters with each kind of symbol that I wanted to add. I also converted the char to upper case before applying the RegExp, which saves me some work.

function isLetter(char: string): boolean {
  return char.toUpperCase().match('[A-ZÀ-ÚÄ-Ü\s]+') !== null;
}

If you want to add another range of letters with another kind of accent, just add it to the regex. Same goes for special symbols.

I implemented this function with TDD and I can confirm this works with, at least, the following cases:

    character | isLetter
    ${'A'}    | ${true}
    ${'e'}    | ${true}
    ${'Á'}    | ${true}
    ${'ü'}    | ${true}
    ${'ù'}    | ${true}
    ${'û'}    | ${true}
    ${'('}    | ${false}
    ${'^'}    | ${false}
    ${"'"}    | ${false}
    ${'`'}    | ${false}

Use character groups

\D

Matches any character except digits 0-9

^\D+$

See example here


Java:

String s= "abcdef";

if(s.matches("[a-zA-Z]+")){
     System.out.println("string only contains letters");
}

The closest option available is

[\u\l]+

which matches a sequence of uppercase and lowercase letters. However, it is not supported by all editors/languages, so it is probably safer to use

[a-zA-Z]+

as other users suggest


JavaScript

If you want to return matched letters:

('Example 123').match(/[A-Z]/gi) // Result: ["E", "x", "a", "m", "p", "l", "e"]

If you want to replace matched letters with stars ('*') for example:

('Example 123').replace(/[A-Z]/gi, '*') //Result: "****** 123"*


Pattern pattern = Pattern.compile("^[a-zA-Z]+$");

if (pattern.matcher("a").find()) {

   ...do something ......
}

pattern = /[a-zA-Z]/

puts "[a-zA-Z]: #{pattern.match("mine blossom")}" OK

puts "[a-zA-Z]: #{pattern.match("456")}"

puts "[a-zA-Z]: #{pattern.match("")}"

puts "[a-zA-Z]: #{pattern.match("#$%^&*")}"

puts "[a-zA-Z]: #{pattern.match("#$%^&*A")}" OK


Regular expression which few people has written as "/^[a-zA-Z]$/i" is not correct because at the last they have mentioned /i which is for case insensitive and after matching for first time it will return back. Instead of /i just use /g which is for global and you also do not have any need to put ^ $ for starting and ending.

/[a-zA-Z]+/g
  1. [a-z_]+ match a single character present in the list below
  2. Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed
  3. a-z a single character in the range between a and z (case sensitive)
  4. A-Z a single character in the range between A and Z (case sensitive)
  5. g modifier: global. All matches (don't return on first match)

Just use \w or [:alpha:]. It is an escape sequences which matches only symbols which might appear in words.