How to check if a string in Python is in ASCII

Question

I want to I check whether a string is in ASCII or not   I am aware of ord    however when I try ord        I have TypeError  ord   expected a character  but string of length 2 found  I understood it is caused by the way I built Python  as explained in ord   s documentation     Is there another way to check

User · Answer

A sting  str-type  in Python is a series of bytes  There is no way of telling just from looking at the string whether this series of bytes represent an ascii string  a string in a 8-bit charset like ISO-8859-1 or a string encoded with UTF-8 or UTF-16 or whatever   However if you know the encoding used  then you can decode the str into a unicode string and then use a regular expression  or a loop  to check if it contains characters outside of the range you are concerned about

User · Answer

def is ascii s       return all ord c   lt  128 for c in s

User · Answer

How about doing this   import string  def isAscii s       for c in s          if c not in string ascii letters              return False     return True

User · Answer

import re  def is ascii s       return bool re match r   x00- x7F      s     To include an empty string as ASCII  change the   to

User · Answer

In Python 3  we can encode the string as UTF-8  then check whether the length stays the same  If so  then the original string is ASCII  def isascii s        quot  quot  quot Check if the characters in string s are in ASCII  U 0-U 7F  quot  quot  quot      return len s     len s encode     To check  pass the test string   gt  gt  gt  isascii  quot  O   O   quot   False  gt  gt  gt  isascii  quot Python quot   True

User · Answer

You could use the regular expression library which accepts the Posix standard    ASCII    definition

User · Answer

Like  RogerDahl s answer but it s more efficient to short-circuit by negating the character class and using search instead of find all or match    gt  gt  gt  import re  gt  gt  gt  re search     x00- x7F     Did you catch that  x00    is not None False  gt  gt  gt  re search     x00- x7F     Did you catch that  xFF    is not None True   I imagine a regular expression is well-optimized for this

User · Answer

To improve Alexander s solution from the Python 2 6  and in Python 3 x  you can use helper module curses ascii and use curses ascii isascii   function or various other  https   docs python org 2 6 library curses ascii html  from curses import ascii  def isascii s       return all ascii isascii c  for c in s

User · Answer

Vincent Marchetti has the right idea  but str decode has been deprecated in Python 3   In Python 3 you can make the same test with str encode   try      mystring encode  ascii   except UnicodeEncodeError      pass    string is not ascii else      pass    string is ascii   Note the exception you want to catch has also changed from UnicodeDecodeError to UnicodeEncodeError

User · Answer

Your question is incorrect  the error you see is not a result of how you built python  but of a confusion between byte strings and unicode strings   Byte strings  e g   foo   or  bar   in python syntax  are sequences of octets  numbers from 0-255   Unicode strings  e g  u foo  or u bar   are sequences of unicode code points  numbers from 0-1112064   But you appear to be interested in the character     which  in your terminal  is a multi-byte sequence that represents a single character   Instead of ord u       try this    gt  gt  gt   ord x  for x in u        That tells you which sequence of code points      represents   It may give you  233   or it may give you  101  770    Instead of chr   to reverse this  there is unichr      gt  gt  gt  unichr 233  u  xe9    This character may actually be represented either a single or multiple unicode  code points   which themselves represent either graphemes or characters   It s either  e with an acute accent  i e   code point 233    or  e   code point 101   followed by  an acute accent on the previous character   code point 770    So this exact same character may be presented as the Python data structure u e u0301  or u  u00e9    Most of the time you shouldn t have to care about this  but it can become an issue if you are iterating over a unicode string  as iteration works by code point  not by decomposable character   In other words  len u e u0301      2 and len u  u00e9      1   If this matters to you  you can convert between composed and decomposed forms by using unicodedata normalize   The Unicode Glossary can be a helpful guide to understanding some of these issues  by pointing how how each specific term refers to a different part of the representation of text  which is far more complicated than many programmers realize

User · Answer

I think you are not  asking the right question--  A string in python has no property corresponding to  ascii   utf-8  or any other encoding  The source of your string  whether you read it from a file  input from a keyboard  etc   may have encoded a unicode string in ascii to produce your string  but that s where you need to go for an answer   Perhaps the question you can ask is   Is this string the result of encoding a unicode string in ascii   -- This you can answer     by trying   try      mystring decode  ascii   except UnicodeDecodeError      print  it was not a ascii-encoded unicode string  else      print  It may have been an ascii-encoded unicode string

User · Answer

How about doing this   import string  def isAscii s       for c in s          if c not in string ascii letters              return False     return True

User · Answer

Ran into something like this recently - for future reference  import chardet  encoding   chardet detect string  if encoding  encoding       ascii       print  string is in ascii    which you could use with   string ascii   string decode encoding  encoding    encode  ascii

User · Answer

I found this question while trying determine how to use encode decode a string whose encoding I wasn t sure of  and how to escape convert special characters in that string    My first step should have been to check the type of the string- I didn t realize there I could get good data about its formatting from type s    This answer was very helpful and got to the real root of my issues   If you re getting a rude and persistent     UnicodeDecodeError   ascii  codec can t decode byte 0xc3 in position 263  ordinal not in range 128    particularly when you re ENCODING  make sure you re not trying to unicode   a string that already IS unicode- for some terrible reason  you get ascii codec errors    See also the Python Kitchen recipe  and the Python docs tutorials for better understanding of how terrible this can be    Eventually I determined that what I wanted to do was this   escaped string   unicode original string encode  ascii   xmlcharrefreplace      Also helpful in debugging was setting the default coding in my file to utf-8  put this at the beginning of your python file      - - coding  utf-8 - -   That allows you to test special characters            without having to use their unicode escapes  u  xe0 xe9 xe7      gt  gt  gt  specials           gt  gt  gt  specials decode  latin-1   encode  ascii   xmlcharrefreplace     amp  224  amp  233  amp  231

User · Answer

I use the following to determine if the string is ascii or unicode    gt  gt  print  test string    class     name   str  gt  gt  gt  print u test string    class     name   unicode  gt  gt  gt     Then just use a conditional block to define the function   def is ascii input       if input   class     name       str           return True     return False

User · Answer

To improve Alexander s solution from the Python 2 6  and in Python 3 x  you can use helper module curses ascii and use curses ascii isascii   function or various other  https   docs python org 2 6 library curses ascii html  from curses import ascii  def isascii s       return all ascii isascii c  for c in s

User · Answer

Your question is incorrect  the error you see is not a result of how you built python  but of a confusion between byte strings and unicode strings   Byte strings  e g   foo   or  bar   in python syntax  are sequences of octets  numbers from 0-255   Unicode strings  e g  u foo  or u bar   are sequences of unicode code points  numbers from 0-1112064   But you appear to be interested in the character     which  in your terminal  is a multi-byte sequence that represents a single character   Instead of ord u       try this    gt  gt  gt   ord x  for x in u        That tells you which sequence of code points      represents   It may give you  233   or it may give you  101  770    Instead of chr   to reverse this  there is unichr      gt  gt  gt  unichr 233  u  xe9    This character may actually be represented either a single or multiple unicode  code points   which themselves represent either graphemes or characters   It s either  e with an acute accent  i e   code point 233    or  e   code point 101   followed by  an acute accent on the previous character   code point 770    So this exact same character may be presented as the Python data structure u e u0301  or u  u00e9    Most of the time you shouldn t have to care about this  but it can become an issue if you are iterating over a unicode string  as iteration works by code point  not by decomposable character   In other words  len u e u0301      2 and len u  u00e9      1   If this matters to you  you can convert between composed and decomposed forms by using unicodedata normalize   The Unicode Glossary can be a helpful guide to understanding some of these issues  by pointing how how each specific term refers to a different part of the representation of text  which is far more complicated than many programmers realize

User · Answer

I found this question while trying determine how to use encode decode a string whose encoding I wasn t sure of  and how to escape convert special characters in that string    My first step should have been to check the type of the string- I didn t realize there I could get good data about its formatting from type s    This answer was very helpful and got to the real root of my issues   If you re getting a rude and persistent     UnicodeDecodeError   ascii  codec can t decode byte 0xc3 in position 263  ordinal not in range 128    particularly when you re ENCODING  make sure you re not trying to unicode   a string that already IS unicode- for some terrible reason  you get ascii codec errors    See also the Python Kitchen recipe  and the Python docs tutorials for better understanding of how terrible this can be    Eventually I determined that what I wanted to do was this   escaped string   unicode original string encode  ascii   xmlcharrefreplace      Also helpful in debugging was setting the default coding in my file to utf-8  put this at the beginning of your python file      - - coding  utf-8 - -   That allows you to test special characters            without having to use their unicode escapes  u  xe0 xe9 xe7      gt  gt  gt  specials           gt  gt  gt  specials decode  latin-1   encode  ascii   xmlcharrefreplace     amp  224  amp  233  amp  231

User · Answer

Your question is incorrect  the error you see is not a result of how you built python  but of a confusion between byte strings and unicode strings   Byte strings  e g   foo   or  bar   in python syntax  are sequences of octets  numbers from 0-255   Unicode strings  e g  u foo  or u bar   are sequences of unicode code points  numbers from 0-1112064   But you appear to be interested in the character     which  in your terminal  is a multi-byte sequence that represents a single character   Instead of ord u       try this    gt  gt  gt   ord x  for x in u        That tells you which sequence of code points      represents   It may give you  233   or it may give you  101  770    Instead of chr   to reverse this  there is unichr      gt  gt  gt  unichr 233  u  xe9    This character may actually be represented either a single or multiple unicode  code points   which themselves represent either graphemes or characters   It s either  e with an acute accent  i e   code point 233    or  e   code point 101   followed by  an acute accent on the previous character   code point 770    So this exact same character may be presented as the Python data structure u e u0301  or u  u00e9    Most of the time you shouldn t have to care about this  but it can become an issue if you are iterating over a unicode string  as iteration works by code point  not by decomposable character   In other words  len u e u0301      2 and len u  u00e9      1   If this matters to you  you can convert between composed and decomposed forms by using unicodedata normalize   The Unicode Glossary can be a helpful guide to understanding some of these issues  by pointing how how each specific term refers to a different part of the representation of text  which is far more complicated than many programmers realize

User · Answer

How about doing this   import string  def isAscii s       for c in s          if c not in string ascii letters              return False     return True

User · Answer

I use the following to determine if the string is ascii or unicode    gt  gt  print  test string    class     name   str  gt  gt  gt  print u test string    class     name   unicode  gt  gt  gt     Then just use a conditional block to define the function   def is ascii input       if input   class     name       str           return True     return False

User · Answer

New in Python 3 7  bpo32677   No more tiresome inefficient ascii checks on strings  new built-in str bytes bytearray method -  isascii   will check if the strings is ascii   print  is this ascii   isascii      True

User · Answer

You could use the regular expression library which accepts the Posix standard    ASCII    definition

User · Answer

def is ascii s       return all ord c   lt  128 for c in s

User · Answer

I think you are not  asking the right question--  A string in python has no property corresponding to  ascii   utf-8  or any other encoding  The source of your string  whether you read it from a file  input from a keyboard  etc   may have encoded a unicode string in ascii to produce your string  but that s where you need to go for an answer   Perhaps the question you can ask is   Is this string the result of encoding a unicode string in ascii   -- This you can answer     by trying   try      mystring decode  ascii   except UnicodeDecodeError      print  it was not a ascii-encoded unicode string  else      print  It may have been an ascii-encoded unicode string

User · Answer

import re  def is ascii s       return bool re match r   x00- x7F      s     To include an empty string as ASCII  change the   to

User · Answer

A sting  str-type  in Python is a series of bytes  There is no way of telling just from looking at the string whether this series of bytes represent an ascii string  a string in a 8-bit charset like ISO-8859-1 or a string encoded with UTF-8 or UTF-16 or whatever   However if you know the encoding used  then you can decode the str into a unicode string and then use a regular expression  or a loop  to check if it contains characters outside of the range you are concerned about

User · Answer

How about doing this   import string  def isAscii s       for c in s          if c not in string ascii letters              return False     return True

User · Answer

A sting  str-type  in Python is a series of bytes  There is no way of telling just from looking at the string whether this series of bytes represent an ascii string  a string in a 8-bit charset like ISO-8859-1 or a string encoded with UTF-8 or UTF-16 or whatever   However if you know the encoding used  then you can decode the str into a unicode string and then use a regular expression  or a loop  to check if it contains characters outside of the range you are concerned about

User · Answer

You could use the regular expression library which accepts the Posix standard    ASCII    definition

User · Answer

def is ascii s       return all ord c   lt  128 for c in s

User · Answer

Vincent Marchetti has the right idea  but str decode has been deprecated in Python 3   In Python 3 you can make the same test with str encode   try      mystring encode  ascii   except UnicodeEncodeError      pass    string is not ascii else      pass    string is ascii   Note the exception you want to catch has also changed from UnicodeDecodeError to UnicodeEncodeError

User · Answer

A sting  str-type  in Python is a series of bytes  There is no way of telling just from looking at the string whether this series of bytes represent an ascii string  a string in a 8-bit charset like ISO-8859-1 or a string encoded with UTF-8 or UTF-16 or whatever   However if you know the encoding used  then you can decode the str into a unicode string and then use a regular expression  or a loop  to check if it contains characters outside of the range you are concerned about

User · Answer

To prevent your code from crashes  you maybe want to use a try-except to catch TypeErrors   gt  gt  gt  ord       Traceback  most recent call last     File   lt stdin gt    line 1  in  lt module gt  TypeError  ord   expected a character  but string of length 2 found   For example   def is ascii s       try          return all ord c   lt  128 for c in s      except TypeError          return False

User · Answer

In Python 3  we can encode the string as UTF-8  then check whether the length stays the same  If so  then the original string is ASCII  def isascii s        quot  quot  quot Check if the characters in string s are in ASCII  U 0-U 7F  quot  quot  quot      return len s     len s encode     To check  pass the test string   gt  gt  gt  isascii  quot  O   O   quot   False  gt  gt  gt  isascii  quot Python quot   True

User · Answer

def is ascii s       return all ord c   lt  128 for c in s

User · Answer

I think you are not  asking the right question--  A string in python has no property corresponding to  ascii   utf-8  or any other encoding  The source of your string  whether you read it from a file  input from a keyboard  etc   may have encoded a unicode string in ascii to produce your string  but that s where you need to go for an answer   Perhaps the question you can ask is   Is this string the result of encoding a unicode string in ascii   -- This you can answer     by trying   try      mystring decode  ascii   except UnicodeDecodeError      print  it was not a ascii-encoded unicode string  else      print  It may have been an ascii-encoded unicode string

User · Answer

You could use the regular expression library which accepts the Posix standard    ASCII    definition

User · Answer

Like  RogerDahl s answer but it s more efficient to short-circuit by negating the character class and using search instead of find all or match    gt  gt  gt  import re  gt  gt  gt  re search     x00- x7F     Did you catch that  x00    is not None False  gt  gt  gt  re search     x00- x7F     Did you catch that  xFF    is not None True   I imagine a regular expression is well-optimized for this

User · Answer

Your question is incorrect  the error you see is not a result of how you built python  but of a confusion between byte strings and unicode strings   Byte strings  e g   foo   or  bar   in python syntax  are sequences of octets  numbers from 0-255   Unicode strings  e g  u foo  or u bar   are sequences of unicode code points  numbers from 0-1112064   But you appear to be interested in the character     which  in your terminal  is a multi-byte sequence that represents a single character   Instead of ord u       try this    gt  gt  gt   ord x  for x in u        That tells you which sequence of code points      represents   It may give you  233   or it may give you  101  770    Instead of chr   to reverse this  there is unichr      gt  gt  gt  unichr 233  u  xe9    This character may actually be represented either a single or multiple unicode  code points   which themselves represent either graphemes or characters   It s either  e with an acute accent  i e   code point 233    or  e   code point 101   followed by  an acute accent on the previous character   code point 770    So this exact same character may be presented as the Python data structure u e u0301  or u  u00e9    Most of the time you shouldn t have to care about this  but it can become an issue if you are iterating over a unicode string  as iteration works by code point  not by decomposable character   In other words  len u e u0301      2 and len u  u00e9      1   If this matters to you  you can convert between composed and decomposed forms by using unicodedata normalize   The Unicode Glossary can be a helpful guide to understanding some of these issues  by pointing how how each specific term refers to a different part of the representation of text  which is far more complicated than many programmers realize

User · Answer

New in Python 3 7  bpo32677   No more tiresome inefficient ascii checks on strings  new built-in str bytes bytearray method -  isascii   will check if the strings is ascii   print  is this ascii   isascii      True

User · Answer

I think you are not  asking the right question--  A string in python has no property corresponding to  ascii   utf-8  or any other encoding  The source of your string  whether you read it from a file  input from a keyboard  etc   may have encoded a unicode string in ascii to produce your string  but that s where you need to go for an answer   Perhaps the question you can ask is   Is this string the result of encoding a unicode string in ascii   -- This you can answer     by trying   try      mystring decode  ascii   except UnicodeDecodeError      print  it was not a ascii-encoded unicode string  else      print  It may have been an ascii-encoded unicode string

User · Answer

To prevent your code from crashes  you maybe want to use a try-except to catch TypeErrors   gt  gt  gt  ord       Traceback  most recent call last     File   lt stdin gt    line 1  in  lt module gt  TypeError  ord   expected a character  but string of length 2 found   For example   def is ascii s       try          return all ord c   lt  128 for c in s      except TypeError          return False

User · Answer

Ran into something like this recently - for future reference  import chardet  encoding   chardet detect string  if encoding  encoding       ascii       print  string is in ascii    which you could use with   string ascii   string decode encoding  encoding    encode  ascii

[python] How to check if a string in Python is in ASCII?

Examples related to python

Examples related to string

Examples related to unicode

Examples related to ascii