How do I verify that a string only contains letters numbers underscores and dashes

Question

I know how to do this if I iterate through all of the characters in the string but I am looking for a more elegant method

User · Answer

You could always use a list comprehension and check the results with all  it would be a little less resource intensive than using a regex  all  c in string letters   string digits          -   for c in mystring

User · Answer

There are a variety of ways of achieving this goal  some are clearer than others  For each of my examples   True  means that the string passed is valid   False  means it contains invalid characters   First of all  there s the naive approach   import string allowed   string letters   string digits          -   def check naive mystring       return all c in allowed for c in mystring    Then there s use of a regular expression  you can do this with re match    Note that  -  has to be at the end of the    otherwise it will be used as a  range  delimiter  Also note the   which means  end of string   Other answers noted in this question use a special character class    w   I always prefer using an explicit character class range using    because it is easier to understand without having to look up a quick reference guide  and easier to special-case   import re CHECK RE   re compile   a-zA-Z0-9 -      def check re mystring       return CHECK RE match mystring    Another solution noted that you can do an inverse match with regular expressions  I ve included that here now  Note that        inverts the character class because the   is used   CHECK INV RE   re compile    a-zA-Z0-9 -    def check inv re mystring      return not CHECK INV RE search mystring    You can also do something tricky with the  set  object  Have a look at this example  which removes from the original string all the characters that are allowed  leaving us with a set containing either a  nothing  or b  the offending characters from the string   def check set mystring       return not set mystring  - set allowed

User · Answer

You could always use a list comprehension and check the results with all  it would be a little less resource intensive than using a regex  all  c in string letters   string digits          -   for c in mystring

User · Answer

There are a variety of ways of achieving this goal  some are clearer than others  For each of my examples   True  means that the string passed is valid   False  means it contains invalid characters   First of all  there s the naive approach   import string allowed   string letters   string digits          -   def check naive mystring       return all c in allowed for c in mystring    Then there s use of a regular expression  you can do this with re match    Note that  -  has to be at the end of the    otherwise it will be used as a  range  delimiter  Also note the   which means  end of string   Other answers noted in this question use a special character class    w   I always prefer using an explicit character class range using    because it is easier to understand without having to look up a quick reference guide  and easier to special-case   import re CHECK RE   re compile   a-zA-Z0-9 -      def check re mystring       return CHECK RE match mystring    Another solution noted that you can do an inverse match with regular expressions  I ve included that here now  Note that        inverts the character class because the   is used   CHECK INV RE   re compile    a-zA-Z0-9 -    def check inv re mystring      return not CHECK INV RE search mystring    You can also do something tricky with the  set  object  Have a look at this example  which removes from the original string all the characters that are allowed  leaving us with a set containing either a  nothing  or b  the offending characters from the string   def check set mystring       return not set mystring  - set allowed

User · Answer

pat   re compile      w-      def onlyallowed s       return not pat search  s

User · Answer

If it were not for the dashes and underscores  the easiest solution would be  my little string isalnum      Section 3 6 1 of the Python Library Reference

User · Answer

Edit   There s another solution not mentioned yet  and it seems to outperform the others given so far in most cases   Use string translate to replace all valid characters in the string  and see if we have any invalid ones left over   This is pretty fast as it uses the underlying C function to do the work  with very little python bytecode involved   Obviously performance isn t everything - going for the most readable solutions is probably the best approach when not in a performance critical codepath  but just to see how the solutions stack up  here s a performance comparison of all the methods proposed so far   check trans is the one using the string translate method   Test code   import string  re  timeit  pat   re compile    w-      pat inv   re compile      w-    allowed chars string ascii letters   string digits     -  allowed set   set allowed chars  trans table   string maketrans         def check set diff s       return not set s  - allowed set  def check set all s       return all x in allowed set for x in s   def check set subset s       return set s  issubset allowed set   def check re match s       return pat match s   def check re inverse s     Search for non-matching character      return not pat inv search s   def check trans s       return not s translate trans table allowed chars   test long almost valid  a very long string that is mostly valid except for last char  99       test long valid  a very long string that is completely valid     99 test short valid  short valid string  test short invalid        amp   test long invalid        amp     99 test empty     def main        funcs   sorted f for f in globals   if f startswith  check         tests   sorted f for f in globals   if f startswith  test         for test in tests          print  Test  -15s  length    d       test  len globals   test            for func in funcs              print     -20s     3f     func                      timeit Timer   s  s      func  test    from   main   import pat allowed set  s        join funcs tests   timeit 10000           print  if   name       main     main     The results on my system are   Test test empty       length   0     check re inverse       0 042   check re match         0 030   check set all          0 027   check set diff         0 029   check set subset       0 029   check trans            0 014  Test test long almost valid  length   5941     check re inverse       2 690   check re match         3 037   check set all          18 860   check set diff         2 905   check set subset       2 903   check trans            0 182  Test test long invalid  length   594     check re inverse       0 017   check re match         0 015   check set all          0 044   check set diff         0 311   check set subset       0 308   check trans            0 034  Test test long valid  length   4356     check re inverse       1 890   check re match         1 010   check set all          14 411   check set diff         2 101   check set subset       2 333   check trans            0 140  Test test short invalid  length   6     check re inverse       0 017   check re match         0 019   check set all          0 044   check set diff         0 032   check set subset       0 037   check trans            0 015  Test test short valid  length   18     check re inverse       0 125   check re match         0 066   check set all          0 104   check set diff         0 051   check set subset       0 046   check trans            0 017   The translate approach seems best in most cases  dramatically so with long valid strings  but is beaten out by regexes in test long invalid  Presumably because the regex can bail out immediately  but translate always has to scan the whole string    The set approaches are usually worst  beating regexes only for the empty string case   Using all x in allowed set for x in s  performs well if it bails out early  but can be bad if it has to iterate through every character   isSubSet and set difference are comparable  and are consistently proportional to the length of the string regardless of the data   There s a similar difference between the regex methods matching all valid characters and searching for invalid characters   Matching performs a little better when checking for a long  but fully valid string  but worse for invalid characters near the end of the string

User · Answer

A regular expression will do the trick with very little code   import re       if re match    A-Za-z0-9 -      my little string         do something here

User · Answer

Edit   There s another solution not mentioned yet  and it seems to outperform the others given so far in most cases   Use string translate to replace all valid characters in the string  and see if we have any invalid ones left over   This is pretty fast as it uses the underlying C function to do the work  with very little python bytecode involved   Obviously performance isn t everything - going for the most readable solutions is probably the best approach when not in a performance critical codepath  but just to see how the solutions stack up  here s a performance comparison of all the methods proposed so far   check trans is the one using the string translate method   Test code   import string  re  timeit  pat   re compile    w-      pat inv   re compile      w-    allowed chars string ascii letters   string digits     -  allowed set   set allowed chars  trans table   string maketrans         def check set diff s       return not set s  - allowed set  def check set all s       return all x in allowed set for x in s   def check set subset s       return set s  issubset allowed set   def check re match s       return pat match s   def check re inverse s     Search for non-matching character      return not pat inv search s   def check trans s       return not s translate trans table allowed chars   test long almost valid  a very long string that is mostly valid except for last char  99       test long valid  a very long string that is completely valid     99 test short valid  short valid string  test short invalid        amp   test long invalid        amp     99 test empty     def main        funcs   sorted f for f in globals   if f startswith  check         tests   sorted f for f in globals   if f startswith  test         for test in tests          print  Test  -15s  length    d       test  len globals   test            for func in funcs              print     -20s     3f     func                      timeit Timer   s  s      func  test    from   main   import pat allowed set  s        join funcs tests   timeit 10000           print  if   name       main     main     The results on my system are   Test test empty       length   0     check re inverse       0 042   check re match         0 030   check set all          0 027   check set diff         0 029   check set subset       0 029   check trans            0 014  Test test long almost valid  length   5941     check re inverse       2 690   check re match         3 037   check set all          18 860   check set diff         2 905   check set subset       2 903   check trans            0 182  Test test long invalid  length   594     check re inverse       0 017   check re match         0 015   check set all          0 044   check set diff         0 311   check set subset       0 308   check trans            0 034  Test test long valid  length   4356     check re inverse       1 890   check re match         1 010   check set all          14 411   check set diff         2 101   check set subset       2 333   check trans            0 140  Test test short invalid  length   6     check re inverse       0 017   check re match         0 019   check set all          0 044   check set diff         0 032   check set subset       0 037   check trans            0 015  Test test short valid  length   18     check re inverse       0 125   check re match         0 066   check set all          0 104   check set diff         0 051   check set subset       0 046   check trans            0 017   The translate approach seems best in most cases  dramatically so with long valid strings  but is beaten out by regexes in test long invalid  Presumably because the regex can bail out immediately  but translate always has to scan the whole string    The set approaches are usually worst  beating regexes only for the empty string case   Using all x in allowed set for x in s  performs well if it bails out early  but can be bad if it has to iterate through every character   isSubSet and set difference are comparable  and are consistently proportional to the length of the string regardless of the data   There s a similar difference between the regex methods matching all valid characters and searching for invalid characters   Matching performs a little better when checking for a long  but fully valid string  but worse for invalid characters near the end of the string

User · Answer

Regular expression can be very flexible        import re  re fullmatch     w-      target string    fullmatch looks also workable for python 3 4    w  Only  a-zA-Z0-9    So you need to add - char for justify hyphen char      Match one or more repetitions of the preceding char  I guess you don t accept blank input  But if you do  change to         Matches the start of the string      Matches the end of the string   You need these two special characters since you need to avoid the following case  The unwanted chars like  amp  here might appear between the matched pattern    amp  amp  amp PATTERN amp  amp PATTERN

User · Answer

As an alternative to using regex you could do it in Sets   from sets import Set  allowed chars   Set  0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ -    if Set my little sting  issubset allowed chars         your action     print True

User · Answer

use a regex and see if it matches     a-z  A-Z  0-9    -

User · Answer

Well you can ask the help of regex  the great in here     code   import re  string    adsfg34wrtwe4r2      your string that needs to be matched  regex   r    w d          you can also add a space in regex if u want to allow it in the string   if re match regex string       print  yes  else       print  false    Output   yes     Hope this helps

User · Answer

Regular expression can be very flexible        import re  re fullmatch     w-      target string    fullmatch looks also workable for python 3 4    w  Only  a-zA-Z0-9    So you need to add - char for justify hyphen char      Match one or more repetitions of the preceding char  I guess you don t accept blank input  But if you do  change to         Matches the start of the string      Matches the end of the string   You need these two special characters since you need to avoid the following case  The unwanted chars like  amp  here might appear between the matched pattern    amp  amp  amp PATTERN amp  amp PATTERN

User · Answer

Here s something based on Jerub s  naive approach   naive being his words  not mine     import string ALLOWED   frozenset string ascii letters   string digits          -    def check mystring       return all c in ALLOWED for c in mystring    If ALLOWED was a string then I think c in ALLOWED would involve iterating over each character in the string until it found a match or reached the end   Which  to quote Joel Spolsky  is something of a Shlemiel the Painter algorithm   But testing for existence in a set should be more efficient  or at least less dependent on the number of allowed characters   Certainly this approach is a little bit faster on my machine   It s clear and I think it performs plenty well enough for most cases  on my slow machine I can validate tens of thousands of short-ish strings in a fraction of a second    I like it   ACTUALLY on my machine a regexp works out several times faster  and is just as simple as this  arguably simpler    So that probably is the best way forward

User · Answer

You could always use a list comprehension and check the results with all  it would be a little less resource intensive than using a regex  all  c in string letters   string digits          -   for c in mystring

User · Answer

Edit   There s another solution not mentioned yet  and it seems to outperform the others given so far in most cases   Use string translate to replace all valid characters in the string  and see if we have any invalid ones left over   This is pretty fast as it uses the underlying C function to do the work  with very little python bytecode involved   Obviously performance isn t everything - going for the most readable solutions is probably the best approach when not in a performance critical codepath  but just to see how the solutions stack up  here s a performance comparison of all the methods proposed so far   check trans is the one using the string translate method   Test code   import string  re  timeit  pat   re compile    w-      pat inv   re compile      w-    allowed chars string ascii letters   string digits     -  allowed set   set allowed chars  trans table   string maketrans         def check set diff s       return not set s  - allowed set  def check set all s       return all x in allowed set for x in s   def check set subset s       return set s  issubset allowed set   def check re match s       return pat match s   def check re inverse s     Search for non-matching character      return not pat inv search s   def check trans s       return not s translate trans table allowed chars   test long almost valid  a very long string that is mostly valid except for last char  99       test long valid  a very long string that is completely valid     99 test short valid  short valid string  test short invalid        amp   test long invalid        amp     99 test empty     def main        funcs   sorted f for f in globals   if f startswith  check         tests   sorted f for f in globals   if f startswith  test         for test in tests          print  Test  -15s  length    d       test  len globals   test            for func in funcs              print     -20s     3f     func                      timeit Timer   s  s      func  test    from   main   import pat allowed set  s        join funcs tests   timeit 10000           print  if   name       main     main     The results on my system are   Test test empty       length   0     check re inverse       0 042   check re match         0 030   check set all          0 027   check set diff         0 029   check set subset       0 029   check trans            0 014  Test test long almost valid  length   5941     check re inverse       2 690   check re match         3 037   check set all          18 860   check set diff         2 905   check set subset       2 903   check trans            0 182  Test test long invalid  length   594     check re inverse       0 017   check re match         0 015   check set all          0 044   check set diff         0 311   check set subset       0 308   check trans            0 034  Test test long valid  length   4356     check re inverse       1 890   check re match         1 010   check set all          14 411   check set diff         2 101   check set subset       2 333   check trans            0 140  Test test short invalid  length   6     check re inverse       0 017   check re match         0 019   check set all          0 044   check set diff         0 032   check set subset       0 037   check trans            0 015  Test test short valid  length   18     check re inverse       0 125   check re match         0 066   check set all          0 104   check set diff         0 051   check set subset       0 046   check trans            0 017   The translate approach seems best in most cases  dramatically so with long valid strings  but is beaten out by regexes in test long invalid  Presumably because the regex can bail out immediately  but translate always has to scan the whole string    The set approaches are usually worst  beating regexes only for the empty string case   Using all x in allowed set for x in s  performs well if it bails out early  but can be bad if it has to iterate through every character   isSubSet and set difference are comparable  and are consistently proportional to the length of the string regardless of the data   There s a similar difference between the regex methods matching all valid characters and searching for invalid characters   Matching performs a little better when checking for a long  but fully valid string  but worse for invalid characters near the end of the string

User · Answer

use a regex and see if it matches     a-z  A-Z  0-9    -

User · Answer

A regular expression will do the trick with very little code   import re       if re match    A-Za-z0-9 -      my little string         do something here

User · Answer

A regular expression will do the trick with very little code   import re       if re match    A-Za-z0-9 -      my little string         do something here

User · Answer

A regular expression will do the trick with very little code   import re       if re match    A-Za-z0-9 -      my little string         do something here

User · Answer

pat   re compile      w-      def onlyallowed s       return not pat search  s

User · Answer

Edit   There s another solution not mentioned yet  and it seems to outperform the others given so far in most cases   Use string translate to replace all valid characters in the string  and see if we have any invalid ones left over   This is pretty fast as it uses the underlying C function to do the work  with very little python bytecode involved   Obviously performance isn t everything - going for the most readable solutions is probably the best approach when not in a performance critical codepath  but just to see how the solutions stack up  here s a performance comparison of all the methods proposed so far   check trans is the one using the string translate method   Test code   import string  re  timeit  pat   re compile    w-      pat inv   re compile      w-    allowed chars string ascii letters   string digits     -  allowed set   set allowed chars  trans table   string maketrans         def check set diff s       return not set s  - allowed set  def check set all s       return all x in allowed set for x in s   def check set subset s       return set s  issubset allowed set   def check re match s       return pat match s   def check re inverse s     Search for non-matching character      return not pat inv search s   def check trans s       return not s translate trans table allowed chars   test long almost valid  a very long string that is mostly valid except for last char  99       test long valid  a very long string that is completely valid     99 test short valid  short valid string  test short invalid        amp   test long invalid        amp     99 test empty     def main        funcs   sorted f for f in globals   if f startswith  check         tests   sorted f for f in globals   if f startswith  test         for test in tests          print  Test  -15s  length    d       test  len globals   test            for func in funcs              print     -20s     3f     func                      timeit Timer   s  s      func  test    from   main   import pat allowed set  s        join funcs tests   timeit 10000           print  if   name       main     main     The results on my system are   Test test empty       length   0     check re inverse       0 042   check re match         0 030   check set all          0 027   check set diff         0 029   check set subset       0 029   check trans            0 014  Test test long almost valid  length   5941     check re inverse       2 690   check re match         3 037   check set all          18 860   check set diff         2 905   check set subset       2 903   check trans            0 182  Test test long invalid  length   594     check re inverse       0 017   check re match         0 015   check set all          0 044   check set diff         0 311   check set subset       0 308   check trans            0 034  Test test long valid  length   4356     check re inverse       1 890   check re match         1 010   check set all          14 411   check set diff         2 101   check set subset       2 333   check trans            0 140  Test test short invalid  length   6     check re inverse       0 017   check re match         0 019   check set all          0 044   check set diff         0 032   check set subset       0 037   check trans            0 015  Test test short valid  length   18     check re inverse       0 125   check re match         0 066   check set all          0 104   check set diff         0 051   check set subset       0 046   check trans            0 017   The translate approach seems best in most cases  dramatically so with long valid strings  but is beaten out by regexes in test long invalid  Presumably because the regex can bail out immediately  but translate always has to scan the whole string    The set approaches are usually worst  beating regexes only for the empty string case   Using all x in allowed set for x in s  performs well if it bails out early  but can be bad if it has to iterate through every character   isSubSet and set difference are comparable  and are consistently proportional to the length of the string regardless of the data   There s a similar difference between the regex methods matching all valid characters and searching for invalid characters   Matching performs a little better when checking for a long  but fully valid string  but worse for invalid characters near the end of the string

User · Answer

pat   re compile      w-      def onlyallowed s       return not pat search  s

User · Answer

If it were not for the dashes and underscores  the easiest solution would be  my little string isalnum      Section 3 6 1 of the Python Library Reference

User · Answer

use a regex and see if it matches     a-z  A-Z  0-9    -

User · Answer

As an alternative to using regex you could do it in Sets   from sets import Set  allowed chars   Set  0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ -    if Set my little sting  issubset allowed chars         your action     print True

User · Answer

pat   re compile      w-      def onlyallowed s       return not pat search  s

User · Answer

Here s something based on Jerub s  naive approach   naive being his words  not mine     import string ALLOWED   frozenset string ascii letters   string digits          -    def check mystring       return all c in ALLOWED for c in mystring    If ALLOWED was a string then I think c in ALLOWED would involve iterating over each character in the string until it found a match or reached the end   Which  to quote Joel Spolsky  is something of a Shlemiel the Painter algorithm   But testing for existence in a set should be more efficient  or at least less dependent on the number of allowed characters   Certainly this approach is a little bit faster on my machine   It s clear and I think it performs plenty well enough for most cases  on my slow machine I can validate tens of thousands of short-ish strings in a fraction of a second    I like it   ACTUALLY on my machine a regexp works out several times faster  and is just as simple as this  arguably simpler    So that probably is the best way forward

User · Answer

There are a variety of ways of achieving this goal  some are clearer than others  For each of my examples   True  means that the string passed is valid   False  means it contains invalid characters   First of all  there s the naive approach   import string allowed   string letters   string digits          -   def check naive mystring       return all c in allowed for c in mystring    Then there s use of a regular expression  you can do this with re match    Note that  -  has to be at the end of the    otherwise it will be used as a  range  delimiter  Also note the   which means  end of string   Other answers noted in this question use a special character class    w   I always prefer using an explicit character class range using    because it is easier to understand without having to look up a quick reference guide  and easier to special-case   import re CHECK RE   re compile   a-zA-Z0-9 -      def check re mystring       return CHECK RE match mystring    Another solution noted that you can do an inverse match with regular expressions  I ve included that here now  Note that        inverts the character class because the   is used   CHECK INV RE   re compile    a-zA-Z0-9 -    def check inv re mystring      return not CHECK INV RE search mystring    You can also do something tricky with the  set  object  Have a look at this example  which removes from the original string all the characters that are allowed  leaving us with a set containing either a  nothing  or b  the offending characters from the string   def check set mystring       return not set mystring  - set allowed

User · Answer

As an alternative to using regex you could do it in Sets   from sets import Set  allowed chars   Set  0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ -    if Set my little sting  issubset allowed chars         your action     print True

User · Answer

There are a variety of ways of achieving this goal  some are clearer than others  For each of my examples   True  means that the string passed is valid   False  means it contains invalid characters   First of all  there s the naive approach   import string allowed   string letters   string digits          -   def check naive mystring       return all c in allowed for c in mystring    Then there s use of a regular expression  you can do this with re match    Note that  -  has to be at the end of the    otherwise it will be used as a  range  delimiter  Also note the   which means  end of string   Other answers noted in this question use a special character class    w   I always prefer using an explicit character class range using    because it is easier to understand without having to look up a quick reference guide  and easier to special-case   import re CHECK RE   re compile   a-zA-Z0-9 -      def check re mystring       return CHECK RE match mystring    Another solution noted that you can do an inverse match with regular expressions  I ve included that here now  Note that        inverts the character class because the   is used   CHECK INV RE   re compile    a-zA-Z0-9 -    def check inv re mystring      return not CHECK INV RE search mystring    You can also do something tricky with the  set  object  Have a look at this example  which removes from the original string all the characters that are allowed  leaving us with a set containing either a  nothing  or b  the offending characters from the string   def check set mystring       return not set mystring  - set allowed

User · Answer

As an alternative to using regex you could do it in Sets   from sets import Set  allowed chars   Set  0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ -    if Set my little sting  issubset allowed chars         your action     print True

User · Answer

If it were not for the dashes and underscores  the easiest solution would be  my little string isalnum      Section 3 6 1 of the Python Library Reference

User · Answer

use a regex and see if it matches     a-z  A-Z  0-9    -

User · Answer

Well you can ask the help of regex  the great in here     code   import re  string    adsfg34wrtwe4r2      your string that needs to be matched  regex   r    w d          you can also add a space in regex if u want to allow it in the string   if re match regex string       print  yes  else       print  false    Output   yes     Hope this helps

User · Answer

You could always use a list comprehension and check the results with all  it would be a little less resource intensive than using a regex  all  c in string letters   string digits          -   for c in mystring

User · Answer

If it were not for the dashes and underscores  the easiest solution would be  my little string isalnum      Section 3 6 1 of the Python Library Reference

[python] How do I verify that a string only contains letters, numbers, underscores and dashes?

Examples related to python

Examples related to regex

Examples related to string