Changing default encoding of Python

Question

I have many  can t encode  and  can t decode  problems with Python when I run my applications from the console  But in the Eclipse PyDev IDE  the default character encoding is set to UTF-8  and I m fine   I searched around for setting the default encoding  and people say that Python deletes the sys setdefaultencoding function on startup  and we can not use it   So what s the best solution for it

User · Answer

If you get this error when you try to pipe/redirect output of your script

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

Just export PYTHONIOENCODING in console and then run your code.

export PYTHONIOENCODING=utf8

User · Answer

This fixed the issue for me   import os os environ  PYTHONIOENCODING      utf-8

User · Answer

Starting with PyDev 3 4 1  the default encoding is not being changed anymore   See this ticket for details   For earlier versions a solution is to make sure PyDev does not run with UTF-8 as the default encoding  Under Eclipse  run dialog settings   run configurations   if I remember correctly   you can choose the default encoding on the common tab  Change it to US-ASCII if you want to have these errors  early   in other words  in your PyDev environment   Also see an original blog post for this workaround

User · Answer

A  To control sys getdefaultencoding   output   python -c  import sys  print sys getdefaultencoding       ascii  Then  echo  import sys  sys setdefaultencoding  utf-16-be     gt  sitecustomize py   and   PYTHONPATH     PYTHONPATH  python -c  import sys  print sys getdefaultencoding       utf-16-be  You could put your sitecustomize py higher in your PYTHONPATH   Also  you might like to try reload sys  setdefaultencoding by  EOL   B  To control stdin encoding and stdout encoding you want to set PYTHONIOENCODING   python -c  import sys  print sys stdin encoding  sys stdout encoding     ascii ascii  Then   PYTHONIOENCODING  utf-16-be  python -c  import sys   print sys stdin encoding  sys stdout encoding     utf-16-be utf-16-be  Finally  you can use A  or B  or both

User · Answer

Here is the approach I used to produce code that was compatible with both python2 and python3 and always produced utf8 output   I found this answer elsewhere  but I can t remember the source   This approach works by replacing sys stdout with something that isn t quite file-like  but still only using things in the standard library   This may well cause problems for your underlying libraries  but in the simple case where you have good control over how sys stdout out is used through your framework this can be a reasonable approach   sys stdout   io open sys stdout fileno     w   encoding  utf8

User · Answer

You could change the encoding of your entire operating system  On Ubuntu you can do this with  sudo apt install locales  sudo locale-gen en US en US UTF-8     sudo dpkg-reconfigure locales

User · Answer

Here is a simpler method  hack  that gives you back the setdefaultencoding   function that was deleted from sys   import sys   sys setdefaultencoding   does not exist  here  reload sys     Reload does the trick  sys setdefaultencoding  UTF8      Note for Python 3 4   reload   is in the importlib library    This is not a safe thing to do  though  this is obviously a hack  since sys setdefaultencoding   is purposely removed from sys when Python starts  Reenabling it and changing the default encoding can break code that relies on ASCII being the default  this code can be third-party  which would generally make fixing it impossible or dangerous

User · Answer

This is a quick hack for anyone who is  1  On a Windows platform  2  running Python 2 7 and  3  annoyed because a nice piece of software  i e   not written by you so not immediately a candidate for encode decode printing maneuvers  won t display the  pretty unicode characters  in the IDLE environment  Pythonwin prints unicode fine   For example  the neat First Order Logic symbols that Stephan Boyer uses in the output from his pedagogic prover at First Order Logic Prover   I didn t like the idea of forcing a sys reload and I couldn t get the system to cooperate with setting environment variables like PYTHONIOENCODING  tried direct Windows environment variable and also dropping that in a sitecustomize py in site-packages as a one liner   utf-8     So  if you are willing to hack your way to success  go to your IDLE directory  typically   C  Python27 Lib idlelib  Locate the file IOBinding py  Make a copy of that file and store it somewhere else so you can revert to original behavior when you choose  Open the file in the idlelib with an editor  e g   IDLE   Go to this code area     Encoding for file names filesystemencoding   sys getfilesystemencoding    encoding    ascii  if sys platform     win32         On Windows  we could use  mbcs   However  to give the user       a portable encoding name  we need to find the code page      try            -- gt  6 5 17 hack to force IDLE to display utf-8 rather than cp1252           -- gt  encoding   locale getdefaultlocale   1          encoding    utf-8          codecs lookup encoding      except LookupError          pass   In other words  comment out the original code line following the  try  that was making the encoding variable equal to locale getdefaultlocale  because that will give you cp1252 which you don t want  and instead brute force it to  utf-8   by adding the line  encoding    utf-8  as shown     I believe this only affects IDLE display to stdout and not the encoding used for file names etc   that is obtained in the filesystemencoding prior   If you have a problem with any other code you run in IDLE later  just replace the IOBinding py file with the original unmodified file

User · Answer

There is an insightful blog post about it   See https   anonbadger wordpress com 2015 06 16 why-sys-setdefaultencoding-will-break-code    I paraphrase its content below   In python 2 which was not as strongly typed regarding the encoding of strings you could perform operations on differently encoded strings  and succeed  E g  the following would return True   u Toshio      Toshio    That would hold for every  normal  unprefixed  string that was encoded in sys getdefaultencoding    which defaulted to ascii  but not others   The default encoding was meant to be changed system-wide in site py  but not somewhere else  The hacks  also presented here  to set it in user modules were just that  hacks  not the solution   Python 3 did changed the system encoding to default to utf-8  when LC CTYPE is unicode-aware   but the fundamental problem was solved with the requirement to explicitly encode  byte strings whenever they are used with unicode strings

User · Answer

Regarding python2  and python2 only   some of the former answers rely on using the following hack   import sys reload sys     Reload is a hack sys setdefaultencoding  UTF8     It is discouraged to use it  check this or this   In my case  it come with a side-effect  I m using ipython notebooks  and once I run the code the   print   function no longer works  I guess there would be solution to it  but still I think using the hack should not be the correct option   After trying many options  the one that worked for me was using the same code in the sitecustomize py  where that piece of code is meant to be  After evaluating that module  the setdefaultencoding function is removed from sys   So the solution is to append to file  usr lib python2 7 sitecustomize py the code   import sys sys setdefaultencoding  UTF8     When I use virtualenvwrapper the file I edit is    virtualenvs venv-name lib python2 7 sitecustomize py   And when I use with python notebooks and conda  it is   anaconda2 lib python2 7 sitecustomize py

User · Answer

First  reload sys  and setting some random default encoding just regarding the need of an output terminal stream is bad practice  reload often changes things in sys which have been put in place depending on the environment - e g   sys stdin stdout streams  sys excepthook  etc    Solving the encode problem on stdout  The best solution I know for solving the encode problem of print ing unicode strings and beyond-ascii str s  e g  from literals  on sys stdout is  to take care of a sys stdout  file-like object  which is capable and optionally tolerant regarding the needs    When sys stdout encoding is None for some reason  or non-existing  or erroneously false or  less  than what the stdout terminal or stream really is capable of  then try to provide a correct  encoding attribute  At last by replacing sys stdout  amp  sys stderr by a translating file-like object  When the terminal   stream still cannot encode all occurring unicode chars  and when you don t want to break print s just because of that  you can introduce an encode-with-replace behavior in the translating file-like object    Here an example      usr bin env python   encoding  utf-8 import sys  class SmartStdout      def   init   self  encoding None  org stdout None           if org stdout is None              org stdout   getattr sys stdout   org stdout   sys stdout          self org stdout   org stdout         self encoding   encoding or                           getattr org stdout   encoding   None  or  utf-8      def write self  s           self org stdout write s encode self encoding   backslashreplace        def   getattr   self  name           return getattr self org stdout  name   if   name         main         if sys stdout isatty            sys stdout   sys stderr   SmartStdout        us   u aou       z          print us     sys stdout flush     Using beyond-ascii plain string literals in Python 2   2   3 code  The only good reason to change the global default encoding  to UTF-8 only  I think is regarding an application source code decision - and not because of I O stream encodings issues  For writing beyond-ascii string literals into code without being forced to always use u string  style unicode escaping  This can be done rather consistently  despite what anonbadger s article says  by taking care of a Python 2 or Python 2   3 source code basis which uses ascii or UTF-8 plain string literals consistently - as far as those strings potentially undergo silent unicode conversion and move between modules or potentially go to stdout  For that  prefer    encoding  utf-8  or ascii  no declaration   Change or drop libraries which still rely in a very dumb way fatally on ascii default encoding errors beyond chr  127  which is rare today    And do like this at application start  and or via sitecustomize py  in addition to the SmartStdout scheme above - without using reload sys        def set defaultencoding globally encoding  utf-8        assert sys getdefaultencoding   in   ascii    mbcs   encoding      import imp      sys org   imp load dynamic   sys org    sys        sys org setdefaultencoding encoding   if   name         main         sys stdout   sys stderr   SmartStdout       set defaultencoding globally  utf-8        s    aou       z          print s   This way string literals and most operations  except character iteration  work comfortable without thinking about unicode conversion as if there would be Python3 only   File I O of course always need special care regarding encodings - as it is in Python3   Note  plains strings then are implicitely converted from utf-8 to unicode in SmartStdout before being converted to the output stream enconding

[python] Changing default encoding of Python?

Examples related to python

Examples related to encoding

Examples related to utf-8

Examples related to console