[regex] Regular expression for letters, numbers and - _

I'm having trouble checking in PHP if a value is is any of the following combinations

  • letters (upper or lowercase)
  • numbers (0-9)
  • underscore (_)
  • dash (-)
  • point (.)
  • no spaces! or other characters

a few examples:

  • OK: "screen123.css"
  • OK: "screen-new-file.css"
  • OK: "screen_new.js"
  • NOT OK: "screen new file.css"

I guess I need a regex for this, since I need to throw an error when a give string has other characters in it than the ones mentioned above.

This question is related to regex

The answer is


Something like this should work

$code = "screen new file.css";
if (!preg_match("/^[-_a-zA-Z0-9.]+$/", $code))
{
    echo "not valid";
}

This will echo "not valid"


you can use

^[\w.-]+$

the + is to make sure it has at least 1 character. Need the ^ and $ to denote the begin and end, otherwise if the string has a match in the middle, such as @@@@xyz%%%% then it is still a match.

\w already includes alphabets (upper and lower case), numbers, and underscore. So the rest ., -, are just put into the "class" to match. The + means 1 occurrence or more.

P.S. thanks for the note in the comment


/^[\w-_.]*$/

What is means By:

  • ^ Start of string

  • [......] Match characters inside

  • \w Any word character so 0-9 a-z A-Z

  • -_. Matched by charecter - and _ and .

  • Zero or more of pattern or unlimited $ End of string If you want to limit the amount of characters:

    /^[\w-_.]{0,5}$/
    

    {0,5} Means 0-5 Numbers & characters


[A-Za-z0-9_.-]*

This will also match for empty strings, if you do not want that exchange the last * for an +


This is the pattern you are looking for

/^[\w-_.]*$/

What this means:

  • ^ Start of string
  • [...] Match characters inside
  • \w Any word character so 0-9 a-z A-Z
  • -_. Match - and _ and .
  • * Zero or more of pattern or unlimited
  • $ End of string

If you want to limit the amount of characters:

/^[\w-_.]{0,5}$/

{0,5} Means 0-5 characters


To actually cover your pattern, i.e, valid file names according to your rules, I think that you need a little more. Note this doesn't match legal file names from a system perspective. That would be system dependent and more liberal in what it accepts. This is intended to match your acceptable patterns.

^([a-zA-Z0-9]+[_-])*[a-zA-Z0-9]+\.[a-zA-Z0-9]+$

Explanation:

  • ^ Match the start of a string. This (plus the end match) forces the string to conform to the exact expression, not merely contain a substring matching the expression.
  • ([a-zA-Z0-9]+[_-])* Zero or more occurrences of one or more letters or numbers followed by an underscore or dash. This causes all names that contain a dash or underscore to have letters or numbers between them.
  • [a-zA-Z0-9]+ One or more letters or numbers. This covers all names that do not contain an underscore or a dash.
  • \. A literal period (dot). Forces the file name to have an extension and, by exclusion from the rest of the pattern, only allow the period to be used between the name and the extension. If you want more than one extension that could be handled as well using the same technique as for the dash/underscore, just at the end.
  • [a-zA-Z0-9]+ One or more letters or numbers. The extension must be at least one character long and must contain only letters and numbers. This is typical, but if you wanted allow underscores, that could be addressed as well. You could also supply a length range {2,3} instead of the one or more + matcher, if that were more appropriate.
  • $ Match the end of the string. See the starting character.