[python] UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)

I want to parse my XML document. So I have stored my XML document as below

class XMLdocs(db.Expando):  
   id = db.IntegerProperty()    
   name=db.StringProperty()  
   content=db.BlobProperty()  

Now my below is my code

parser = make_parser()     
curHandler = BasketBallHandler()  
parser.setContentHandler(curHandler)  
for q in XMLdocs.all():  
        parser.parse(StringIO.StringIO(q.content))

I am getting below error

'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)
Traceback (most recent call last):  
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 517, in __call__
    handler.post(*groups)   
  File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/base_handler.py", line 59, in post
    self.handle()   
  File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 168, in handle
    scan_aborted = not self.process_entity(entity, ctx)   
  File "/base/data/home/apps/parsepython/1.348669006354245654/mapreduce/handlers.py", line 233, in process_entity
    handler(entity)   
  File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 71, in process
    parser.parse(StringIO.StringIO(q.content))   
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)   
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)  
  File "/base/python_runtime/python_dist/lib/python2.5/xml/sax/expatreader.py", line 207, in feed
    self._parser.Parse(data, isFinal)   
  File "/base/data/home/apps/parsepython/1.348669006354245654/parseXML.py", line 136, in characters   
    print ch   
UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)   

This question is related to python google-app-engine xml-parsing

The answer is


Just putting .encode('utf-8') at the end of object will do the job in recent versions of Python.


The problem according to your traceback is the print statement on line 136 of parseXML.py. Unfortunately you didn't see fit to post that part of your code, but I'm going to guess it is just there for debugging. If you change it to:

print repr(ch)

then you should at least see what you are trying to print.


An easy solution to overcome this problem is to set your default encoding to utf8. Follow is an example

import sys

reload(sys)
sys.setdefaultencoding('utf8')

The actual best answer for this problem depends on your environment, specifically what encoding your terminal expects.

The quickest one-line solution is to encode everything you print to ASCII, which your terminal is almost certain to accept, while discarding characters that you cannot print:

print ch #fails
print ch.encode('ascii', 'ignore')

The better solution is to change your terminal's encoding to utf-8, and encode everything as utf-8 before printing. You should get in the habit of thinking about your unicode encoding EVERY time you print or read a string.


The problem is that you're trying to print an unicode character to a possibly non-unicode terminal. You need to encode it with the 'replace option before printing it, e.g. print ch.encode(sys.stdout.encoding, 'replace').


This worked for me:

from django.utils.encoding import smart_str
content = smart_str(content)

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to google-app-engine

Problems with installation of Google App Engine SDK for php in OS X Get Public URL for File - Google Cloud Storage - App Engine (Python) Visual Studio Code pylint: Unable to import 'protorpc' Get root password for Google Cloud Engine VM Spring Boot - Cannot determine embedded database driver class for database type NONE What is the difference between Google App Engine and Google Compute Engine? Cross-Origin Request Blocked Class JavaLaunchHelper is implemented in both. One of the two will be used. Which one is undefined ImportError: No module named apiclient.discovery java.lang.ClassNotFoundException: com.sun.jersey.spi.container.servlet.ServletContainer

Examples related to xml-parsing

How to create XML file with specific structure in Java jQuery xml error ' No 'Access-Control-Allow-Origin' header is present on the requested resource.' SyntaxError of Non-ASCII character xml.LoadData - Data at the root level is invalid. Line 1, position 1 XML Error: Extra content at the end of the document How to use sed to extract substring How to fix Invalid byte 1 of 1-byte UTF-8 sequence The best node module for XML parsing Parsing XML with namespace in Python via 'ElementTree' oracle plsql: how to parse XML and insert into table