Python - ascii codec can t decode byte

Question

I m really confused  I tried to encode but the error said can t decode        gt  gt  gt       encode  utf8   Traceback  most recent call last   File   lt stdin gt    line 1  in  lt module gt  UnicodeDecodeError   ascii  codec can t decode byte 0xe4 in position 0  ordinal not in range 128    I know how to avoid the error with  u  prefix on the string  I m just wondering why the error is  can t decode  when encode was called   What is Python doing under the hood

User · Answer

Always encode from unicode to bytes  In this direction  you get to choose the encoding      gt  gt  gt  u     encode  utf8     xe4 xbd xa0 xe5 xa5 xbd   gt  gt  gt  print        The other way is to decode from bytes to unicode  In this direction  you have to know what the encoding is    gt  gt  gt  bytes     xe4 xbd xa0 xe5 xa5 xbd   gt  gt  gt  print bytes     gt  gt  gt  bytes decode  utf-8   u  u4f60 u597d   gt  gt  gt  print        This point can t be stressed enough   If you want to avoid playing unicode  whack-a-mole   it s important to understand what s happening at the data level   Here it is explained another way    A unicode object is decoded already  you never want to call decode on it  A bytestring object is encoded already  you never want to call encode on it    Now  on seeing  encode on a byte string  Python 2 first tries to implicitly convert it to text  a unicode object    Similarly  on seeing  decode on a unicode string  Python 2 implicitly tries to convert it to bytes  a str object      These implicit conversions are why you can get UnicodeDecodeError when you ve called encode   It s because encoding usually accepts a parameter of type unicode  when receiving a str parameter  there s an implicit decoding into an object of type unicode before re-encoding it with another encoding   This conversion chooses a default  ascii  decoder     giving you the decoding error inside an encoder   In fact  in Python 3 the methods str decode and bytes encode don t even exist   Their removal was a  controversial  attempt to avoid this common confusion          or whatever coding sys getdefaultencoding   mentions  usually this is  ascii

User · Answer

If you are starting the python interpreter from a shell on Linux or similar systems  BSD  not sure about Mac   you should also check the default encoding for the shell    Call locale charmap from the shell  not the python interpreter  and you should see   user host dir    locale charmap UTF-8  user host dir       If this is not the case  and you see something else  e g     user host dir    locale charmap ANSI X3 4-1968  user host dir       Python will  at least in some cases such as in mine  inherit the shell s encoding and will not be able to print  some  all   unicode characters  Python s own default encoding that you see and control via sys getdefaultencoding   and sys setdefaultencoding   is in this case ignored   If you find that you have this problem  you can fix that by    user host dir    export LC CTYPE  en EN UTF-8   user host dir    locale charmap UTF-8  user host dir        Or alternatively choose whichever keymap you want instead of en EN   You can also edit  etc locale conf  or whichever file governs the locale definition in your system  to correct this

User · Answer

In case you re dealing with Unicode  sometimes instead of encode  utf-8    you can also try to ignore the special characters  e g        encode  ascii   ignore     or as something decode  unicode escape   encode  ascii   ignore   as suggested here   Not particularly useful in this example  but can work better in other scenarios when it s not possible to convert some special characters   Alternatively you can consider replacing particular character using replace

User · Answer

You use u     encode  utf8   to encode an unicode string  But if you want to represent       you should decode it  Just like        decode  utf8     You will get what you want  Maybe you should learn more about encode  amp  decode

User · Answer

You can try this   import sys reload sys  sys setdefaultencoding  utf-8     Or  You can also try following  Add following line at top of your  py file     - - coding  utf-8 - -

User · Answer

encode  utf-8     encode converts a unicode object to a string object  But here you have invoked it on a string object  because you don t have the u   So python has to convert the string to a unicode object first  So it does the equivalent of       decode   encode  utf-8     But the decode fails because the string isn t valid ascii  That s why you get a complaint about not being able to decode

User · Answer

If you re using Python  lt  3  you ll need to tell the interpreter that your string literal is Unicode by prefixing it with a u   Python 2 7 2  default  Jan 14 2012  23 14 09    GCC 4 2 1  Based on Apple Inc  build 5658   LLVM build 2335 15 00   on darwin Type  help    copyright    credits  or  license  for more information   gt  gt  gt       encode  utf8   Traceback  most recent call last     File   lt stdin gt    line 1  in  lt module gt  UnicodeDecodeError   ascii  codec can t decode byte 0xe4 in position 0  ordinal not in range 128   gt  gt  gt  u     encode  utf8     xe4 xbd xa0 xe5 xa5 xbd    Further reading  Unicode HOWTO

[python] Python - 'ascii' codec can't decode byte

Examples related to python

Examples related to python-2.7

Examples related to unicode

Examples related to python-2.x

Examples related to python-unicode