[python] Python - 'ascii' codec can't decode byte

I'm really confused. I tried to encode but the error said can't decode....

>>> "??".encode("utf8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

I know how to avoid the error with "u" prefix on the string. I'm just wondering why the error is "can't decode" when encode was called. What is Python doing under the hood?

This question is related to python python-2.7 unicode python-2.x python-unicode

The answer is


You can try this

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

Or

You can also try following

Add following line at top of your .py file.

# -*- coding: utf-8 -*- 

You use u"??".encode('utf8') to encode an unicode string. But if you want to represent "??", you should decode it. Just like:

"??".decode("utf8")

You will get what you want. Maybe you should learn more about encode & decode.


If you're using Python < 3, you'll need to tell the interpreter that your string literal is Unicode by prefixing it with a u:

Python 2.7.2 (default, Jan 14 2012, 23:14:09) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "??".encode("utf8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)
>>> u"??".encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'

Further reading: Unicode HOWTO.


If you are starting the python interpreter from a shell on Linux or similar systems (BSD, not sure about Mac), you should also check the default encoding for the shell.

Call locale charmap from the shell (not the python interpreter) and you should see

[user@host dir] $ locale charmap
UTF-8
[user@host dir] $ 

If this is not the case, and you see something else, e.g.

[user@host dir] $ locale charmap
ANSI_X3.4-1968
[user@host dir] $ 

Python will (at least in some cases such as in mine) inherit the shell's encoding and will not be able to print (some? all?) unicode characters. Python's own default encoding that you see and control via sys.getdefaultencoding() and sys.setdefaultencoding() is in this case ignored.

If you find that you have this problem, you can fix that by

[user@host dir] $ export LC_CTYPE="en_EN.UTF-8"
[user@host dir] $ locale charmap
UTF-8
[user@host dir] $ 

(Or alternatively choose whichever keymap you want instead of en_EN.) You can also edit /etc/locale.conf (or whichever file governs the locale definition in your system) to correct this.


In case you're dealing with Unicode, sometimes instead of encode('utf-8'), you can also try to ignore the special characters, e.g.

"??".encode('ascii','ignore')

or as something.decode('unicode_escape').encode('ascii','ignore') as suggested here.

Not particularly useful in this example, but can work better in other scenarios when it's not possible to convert some special characters.

Alternatively you can consider replacing particular character using replace().


Always encode from unicode to bytes.
In this direction, you get to choose the encoding.

>>> u"??".encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'
>>> print _
??

The other way is to decode from bytes to unicode.
In this direction, you have to know what the encoding is.

>>> bytes = '\xe4\xbd\xa0\xe5\xa5\xbd'
>>> print bytes
??
>>> bytes.decode('utf-8')
u'\u4f60\u597d'
>>> print _
??

This point can't be stressed enough. If you want to avoid playing unicode "whack-a-mole", it's important to understand what's happening at the data level. Here it is explained another way:

  • A unicode object is decoded already, you never want to call decode on it.
  • A bytestring object is encoded already, you never want to call encode on it.

Now, on seeing .encode on a byte string, Python 2 first tries to implicitly convert it to text (a unicode object). Similarly, on seeing .decode on a unicode string, Python 2 implicitly tries to convert it to bytes (a str object).

These implicit conversions are why you can get UnicodeDecodeError when you've called encode. It's because encoding usually accepts a parameter of type unicode; when receiving a str parameter, there's an implicit decoding into an object of type unicode before re-encoding it with another encoding. This conversion chooses a default 'ascii' decoder, giving you the decoding error inside an encoder.

In fact, in Python 3 the methods str.decode and bytes.encode don't even exist. Their removal was a [controversial] attempt to avoid this common confusion.

...or whatever coding sys.getdefaultencoding() mentions; usually this is 'ascii'


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to python-2.7

Numpy, multiply array with scalar Not able to install Python packages [SSL: TLSV1_ALERT_PROTOCOL_VERSION] How to create a new text file using Python Could not find a version that satisfies the requirement tensorflow Python: Pandas pd.read_excel giving ImportError: Install xlrd >= 0.9.0 for Excel support Display/Print one column from a DataFrame of Series in Pandas How to calculate 1st and 3rd quartiles? How can I read pdf in python? How to completely uninstall python 2.7.13 on Ubuntu 16.04 Check key exist in python dict

Examples related to unicode

How to resolve TypeError: can only concatenate str (not "int") to str (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape UnicodeEncodeError: 'ascii' codec can't encode character at special name Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) HTML for the Pause symbol in audio and video control Javascript: Unicode string to hex Concrete Javascript Regex for Accented Characters (Diacritics) Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD NameError: global name 'unicode' is not defined - in Python 3

Examples related to python-2.x

Combine several images horizontally with Python How to print variables without spaces between values How to return dictionary keys as a list in Python? How to add an element to the beginning of an OrderedDict? How to read a CSV file from a URL with Python? Malformed String ValueError ast.literal_eval() with String representation of Tuple Relative imports for the billionth time How do you use subprocess.check_output() in Python? write() versus writelines() and concatenated strings How to select a directory and store the location using tkinter in Python

Examples related to python-unicode

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c SyntaxError: Non-ASCII character '\xa3' in file when function returns '£' How to print Unicode character in Python? Python string to unicode UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128) Python - 'ascii' codec can't decode byte