UnicodeEncodeError charmap codec can t encode - character maps to undefined print function

Question

I am writing a Python  Python 3 3  program to send some data to a webpage using POST method   Mostly for debugging process I am getting the page result and displaying it on the screen using print   function   The code is like this   conn request  POST   resource  params  headers  response   conn getresponse   print response status  response reason  data   response read   print data decode  utf-8       the HTTPResponse  read   method returns a bytes element encoding the page  which is a well formated UTF-8 document   It seemed okay until I stopped using IDLE GUI for Windows and used the Windows console instead   The returned page has a U 2014 character  em-dash  which the print function translates well in the Windows GUI  I presume Code Page 1252  but does not in the Windows Console  Code Page 850    Given the strict default behavior I get the following error   UnicodeEncodeError   charmap  codec can t encode character   u2014  in position 10248  character maps to  lt undefined gt    I could fix it using this quite ugly code   print data decode  utf-8   encode  cp850   replace   decode  cp850      Now it replace the offending character       with a     Not the ideal case  a hyphen should be a better replacement  but good enough for my purpose   There are several things I do not like from my solution    The code is ugly with all that decoding  encoding  and decoding  It solves the problem for just this case   If I port the program for a system using some other encoding  latin-1  cp437  back to cp1252  etc   it should recognize the target encoding   It does not    for instance  when using again the IDLE GUI  the emdash is also lost  which didn t happen before  It would be nicer if the emdash translated to a hyphen instead of a interrogation bang    The problem is not the emdash  I can think of several ways to solve that particularly problem  but I need to write robust code   I am feeding the page with data from a database and that data can come back   I can anticipate many other conflicting cases  an      U 00c1  which is possible in my database  could translate into CP-850  DOS Windows Console encodign for Western European Languages  but not into CP-437  encoding for US English  which is default in many Windows instalations    So  the question   Is there a nicer solution that makes my code agnostic from the output interface encoding

User · Answer

Based on Dirk St  cker s answer  here s a neat wrapper function for Python 3 s print function  Use it just like you would use print   As an added bonus  compared to the other answers  this won t print your text as a bytearray   b content     but as normal strings   content    because of the last decode step   def uprint  objects  sep      end   n   file sys stdout       enc   file encoding     if enc     UTF-8           print  objects  sep sep  end end  file file      else          f   lambda obj  str obj  encode enc  errors  backslashreplace   decode enc          print  map f  objects   sep sep  end end  file file   uprint  foo   uprint u Anton  n Dvor  k   uprint  foo    bar   u Anton  n Dvor  k

User · Answer

I dug deeper into this and found the best solutions are here   http   blog notdot net 2010 07 Getting-unicode-right-in-Python  In my case I solved  UnicodeEncodeError   charmap  codec can t encode character    original code   print  Process lines  file name command line  s n   command line     New code   print  Process lines  file name command line  s n   command line encode  utf-8

User · Answer

If you use Python 3 6  possibly 3 5 or later   it doesn t give that error to me anymore   I had a similar issue  because I was using v3 4  but it went away after I uninstalled and reinstalled

User · Answer

I see three solutions to this    Change the output encoding  so it will always output UTF-8  See e g  Setting the correct encoding when piping stdout in Python  but I could not get these example to work  Following example code makes the output aware of your target charset     - - coding  utf-8 - - import sys  print sys stdout encoding print u St  cker  encode sys stdout encoding  errors  replace   print u           encode sys stdout encoding  errors  replace     This example properly replaces any non-printable character in my name with a question mark   If you create a custom print function  e g  called myprint  using that mechanisms to encode output properly you can simply replace print with myprint whereever necessary without making the whole code look ugly  Reset the output encoding globally at the begin of the software   The page http   www macfreek nl memory Encoding of Python stdout has a good summary what to do to change output encoding  Especially the section  StreamWriter Wrapper around Stdout  is interesting  Essentially it says to change the I O encoding function like this   In Python 2   if sys stdout encoding     cp850     sys stdout   codecs getwriter  cp850   sys stdout   strict   if sys stderr encoding     cp850     sys stderr   codecs getwriter  cp850   sys stderr   strict     In Python 3   if sys stdout encoding     cp850     sys stdout   codecs getwriter  cp850   sys stdout buffer   strict   if sys stderr encoding     cp850     sys stderr   codecs getwriter  cp850   sys stderr buffer   strict     If used in CGI outputting HTML you can replace  strict  by  xmlcharrefreplace  to get HTML encoded tags for non-printable characters   Feel free to modify the approaches  setting different encodings       Note that it still wont work to output non-specified data  So any data  input  texts must be correctly convertable into unicode     - - coding  utf-8 - - import sys import codecs sys stdout   codecs getwriter  iso-8859-1   sys stdout   xmlcharrefreplace   print u St  cker                   works print  St  cker  decode  utf-8     works print  St  cker                    fails

User · Answer

For debugging purposes  you could use print repr data     To display text  always print Unicode  Don t hardcode the character encoding of your environment such as Cp850 inside your script  To decode the HTTP response  see A good way to get the charset encoding of an HTTP response in Python   To print Unicode to Windows console  you could use win-unicode-console package

User · Answer

If you are using Windows command line to print the data  you should use   chcp 65001   This worked for me

[python] UnicodeEncodeError: 'charmap' codec can't encode - character maps to <undefined>, print function

Examples related to python

Examples related to encoding

Examples related to decode

Examples related to encode