Setting the correct encoding when piping stdout in Python

Question

When piping the output of a Python program  the Python interpreter gets confused about encoding and sets it to None  This means a program like this     - - coding  utf-8 - - print u           will work fine when run normally  but fail with      UnicodeEncodeError   ascii  codec can t encode character u  xa0  in position 0  ordinal not in range 128    when used in a pipe sequence   What is the best way to make this work when piping  Can I just tell it to use whatever encoding the shell filesystem whatever is using    The suggestions I have seen thus far is to modify your site py directly  or hardcoding the defaultencoding using this hack     - - coding  utf-8 - - import sys reload sys  sys setdefaultencoding  utf-8   print u           Is there a better way to make piping work

User · Answer

You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8". I have written a page on my ordeal with this problem.

Tl;dr of the blog post:

import sys, locale, os
print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(os.environ["PYTHONIOENCODING"])
print(chr(246), chr(9786), chr(9787))

gives you

utf_8
False
ANSI_X3.4-1968
ascii
utf_8
ö ? ?

User · Answer

Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using  If you are piping you must encode it yourself   A rule of thumb is  Always use Unicode internally  Decode what you receive  and encode what you send     - - coding  utf-8 - - print u         encode  utf-8     Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8  making everything uppercase in between   import sys for line in sys stdin        Decode what you receive      line   line decode  iso8859-1          Work with Unicode internally      line   line upper          Encode what you send      line   line encode  utf-8       sys stdout write line    Setting the system default encoding is a bad idea  because some modules and libraries you use can rely on the fact it is ASCII  Don t do it

User · Answer

I could  automate  it with a call to   def   fix io encoding last resort default  UTF-8      import sys   if  x for x in  sys stdin sys stdout sys stderr  if x encoding is None          import os       defEnc   None       if defEnc is None           try            import locale           defEnc   locale getpreferredencoding           except  pass       if defEnc is None           try  defEnc   sys getfilesystemencoding           except  pass       if defEnc is None           try  defEnc   sys stdin encoding         except  pass       if defEnc is None           defEnc   last resort default       os environ  PYTHONIOENCODING     os environ get  PYTHONIOENCODING  defEnc        os execvpe sys argv 0  sys argv os environ    fix io encoding     del   fix io encoding   Yes  it s possible to get an infinite loop here if this  setenv  fails

User · Answer

Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using  If you are piping you must encode it yourself   A rule of thumb is  Always use Unicode internally  Decode what you receive  and encode what you send     - - coding  utf-8 - - print u         encode  utf-8     Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8  making everything uppercase in between   import sys for line in sys stdin        Decode what you receive      line   line decode  iso8859-1          Work with Unicode internally      line   line upper          Encode what you send      line   line encode  utf-8       sys stdout write line    Setting the system default encoding is a bad idea  because some modules and libraries you use can rely on the fact it is ASCII  Don t do it

User · Answer

An arguable sanitized version of Craig McQueen s answer   import sys  codecs class EncodedOut      def   init   self  enc           self enc   enc         self stdout   sys stdout     def   enter   self           if sys stdout encoding is None              w   codecs getwriter self enc              sys stdout   w sys stdout      def   exit   self  exc ty  exc val  tb           sys stdout   self stdout   Usage   with EncodedOut  utf-8        print u

User · Answer

I ran into this problem in a legacy application  and it was difficult to identify where what was printed  I helped myself with this hack     encoding utf8 py import codecs import builtins   def print utf8 text    kwargs       print str text  encode  utf-8      kwargs    def print utf8 fn       def print fn  args    kwargs           return fn str  args  encode  utf-8      kwargs      return print fn   builtins print   print utf8 print    On top of my script  test py   import encoding utf8 string    Axwell   Ingrosso  print string    Note that this changes ALL calls to print to use an encoding  so your console will print this     python test py b Axwell  xce x9b Ingrosso

User · Answer

I just thought I d mention something here which I had to spent a long time experimenting with before I finally realised what was going on  This may be so obvious to everyone here that they haven t bothered mentioning it  But it would ve helped me if they had  so on that principle      NB  I am using Jython specifically  v 2 7  so just possibly this may not apply to CPython     NB2  the first two lines of my  py file here are     - - coding  utf-8 - - from   future   import print function   The      AKA  interpolation operator   string construction mechanism causes ADDITIONAL problems too    If the default encoding of the  environment  is ASCII and you try to do something like  print   bonjour   s     fr  d       Call this  print A    You will have no difficulty running in Eclipse    In a Windows CLI  DOS window  you will find that the encoding is code page 850  my Windows nbsp 7 OS  or something similar  which can handle European accented characters at least  so it ll work   print  u bonjour   s     fr  d      Call this  print B    will also work   If  OTOH  you direct to a file from the CLI  the stdout encoding will be None  which will default to ASCII  on my OS anyway   which will not be able to handle either of the above prints     dreaded encoding error    So then you might think of redirecting your stdout by using  sys stdout   codecs getwriter  utf8   sys stdout    and try running in the CLI piping to a file    Very oddly  print A above will work    But print B above will throw the encoding error  The following will however work OK   print  u bonjour       fr  d      Call this  print C    The conclusion I have come to  provisionally  is that if a string which is specified to be a Unicode string using the  u  prefix is submitted to the  -handling mechanism it appears to involve the use of the default environment encoding  regardless of whether you have set stdout to redirect   How people deal with this is a matter of choice  I would welcome a Unicode expert to say why this happens  whether I ve got it wrong in some way  what the preferred solution to this  whether it also applies to CPython  whether it happens in Python 3  etc   etc

User · Answer

On Windows  I had this problem very often when running a Python code from an editor  like Sublime Text   but not if running it from command-line   In this case  check your editor s parameters  In the case of SublimeText  this Python sublime-build solved it        cmd     python    -u     file       file regex         File              line   0-9         selector    source python      encoding    utf8      env     PYTHONIOENCODING    utf-8    LANG    en US UTF-8

User · Answer

Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using  If you are piping you must encode it yourself   A rule of thumb is  Always use Unicode internally  Decode what you receive  and encode what you send     - - coding  utf-8 - - print u         encode  utf-8     Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8  making everything uppercase in between   import sys for line in sys stdin        Decode what you receive      line   line decode  iso8859-1          Work with Unicode internally      line   line upper          Encode what you send      line   line encode  utf-8       sys stdout write line    Setting the system default encoding is a bad idea  because some modules and libraries you use can rely on the fact it is ASCII  Don t do it

User · Answer

First  regarding this solution     - - coding  utf-8 - - print u         encode  utf-8     It s not practical to explicitly print with a given encoding every time  That would be repetitive and error-prone   A better solution is to change sys stdout at the start of your program  to encode with a selected encoding  Here is one solution I found on Python  How is sys stdout encoding chosen   in particular a comment by  toka    import sys import codecs sys stdout   codecs getwriter  utf8   sys stdout

User · Answer

On Windows  I had this problem very often when running a Python code from an editor  like Sublime Text   but not if running it from command-line   In this case  check your editor s parameters  In the case of SublimeText  this Python sublime-build solved it        cmd     python    -u     file       file regex         File              line   0-9         selector    source python      encoding    utf8      env     PYTHONIOENCODING    utf-8    LANG    en US UTF-8

User · Answer

I ran into this problem in a legacy application  and it was difficult to identify where what was printed  I helped myself with this hack     encoding utf8 py import codecs import builtins   def print utf8 text    kwargs       print str text  encode  utf-8      kwargs    def print utf8 fn       def print fn  args    kwargs           return fn str  args  encode  utf-8      kwargs      return print fn   builtins print   print utf8 print    On top of my script  test py   import encoding utf8 string    Axwell   Ingrosso  print string    Note that this changes ALL calls to print to use an encoding  so your console will print this     python test py b Axwell  xce x9b Ingrosso

User · Answer

I had a similar issue last week  It was easy to fix in my IDE  PyCharm    Here was my fix   Starting from PyCharm menu bar  File -  Settings    -  Editor -  File Encodings  then set   IDE Encoding    Project Encoding  and  Default encoding for properties files  ALL to UTF-8 and she now works like a charm   Hope this helps

User · Answer

I just thought I d mention something here which I had to spent a long time experimenting with before I finally realised what was going on  This may be so obvious to everyone here that they haven t bothered mentioning it  But it would ve helped me if they had  so on that principle      NB  I am using Jython specifically  v 2 7  so just possibly this may not apply to CPython     NB2  the first two lines of my  py file here are     - - coding  utf-8 - - from   future   import print function   The      AKA  interpolation operator   string construction mechanism causes ADDITIONAL problems too    If the default encoding of the  environment  is ASCII and you try to do something like  print   bonjour   s     fr  d       Call this  print A    You will have no difficulty running in Eclipse    In a Windows CLI  DOS window  you will find that the encoding is code page 850  my Windows nbsp 7 OS  or something similar  which can handle European accented characters at least  so it ll work   print  u bonjour   s     fr  d      Call this  print B    will also work   If  OTOH  you direct to a file from the CLI  the stdout encoding will be None  which will default to ASCII  on my OS anyway   which will not be able to handle either of the above prints     dreaded encoding error    So then you might think of redirecting your stdout by using  sys stdout   codecs getwriter  utf8   sys stdout    and try running in the CLI piping to a file    Very oddly  print A above will work    But print B above will throw the encoding error  The following will however work OK   print  u bonjour       fr  d      Call this  print C    The conclusion I have come to  provisionally  is that if a string which is specified to be a Unicode string using the  u  prefix is submitted to the  -handling mechanism it appears to involve the use of the default environment encoding  regardless of whether you have set stdout to redirect   How people deal with this is a matter of choice  I would welcome a Unicode expert to say why this happens  whether I ve got it wrong in some way  what the preferred solution to this  whether it also applies to CPython  whether it happens in Python 3  etc   etc

User · Answer

An arguable sanitized version of Craig McQueen s answer   import sys  codecs class EncodedOut      def   init   self  enc           self enc   enc         self stdout   sys stdout     def   enter   self           if sys stdout encoding is None              w   codecs getwriter self enc              sys stdout   w sys stdout      def   exit   self  exc ty  exc val  tb           sys stdout   self stdout   Usage   with EncodedOut  utf-8        print u

User · Answer

I had a similar issue last week  It was easy to fix in my IDE  PyCharm    Here was my fix   Starting from PyCharm menu bar  File -  Settings    -  Editor -  File Encodings  then set   IDE Encoding    Project Encoding  and  Default encoding for properties files  ALL to UTF-8 and she now works like a charm   Hope this helps

User · Answer

You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8". I have written a page on my ordeal with this problem.

Tl;dr of the blog post:

import sys, locale, os
print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(os.environ["PYTHONIOENCODING"])
print(chr(246), chr(9786), chr(9787))

gives you

utf_8
False
ANSI_X3.4-1968
ascii
utf_8
ö ? ?

User · Answer

Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using  If you are piping you must encode it yourself   A rule of thumb is  Always use Unicode internally  Decode what you receive  and encode what you send     - - coding  utf-8 - - print u         encode  utf-8     Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8  making everything uppercase in between   import sys for line in sys stdin        Decode what you receive      line   line decode  iso8859-1          Work with Unicode internally      line   line upper          Encode what you send      line   line encode  utf-8       sys stdout write line    Setting the system default encoding is a bad idea  because some modules and libraries you use can rely on the fact it is ASCII  Don t do it

User · Answer

I could  automate  it with a call to   def   fix io encoding last resort default  UTF-8      import sys   if  x for x in  sys stdin sys stdout sys stderr  if x encoding is None          import os       defEnc   None       if defEnc is None           try            import locale           defEnc   locale getpreferredencoding           except  pass       if defEnc is None           try  defEnc   sys getfilesystemencoding           except  pass       if defEnc is None           try  defEnc   sys stdin encoding         except  pass       if defEnc is None           defEnc   last resort default       os environ  PYTHONIOENCODING     os environ get  PYTHONIOENCODING  defEnc        os execvpe sys argv 0  sys argv os environ    fix io encoding     del   fix io encoding   Yes  it s possible to get an infinite loop here if this  setenv  fails

User · Answer

export PYTHONIOENCODING utf-8   do the job  but can t set it on python itself      what we can do is verify if isn t setting and tell the user to set it before call script with    if   name         main         if  sys stdout encoding is None           print  gt  gt  sys stderr   please set python env PYTHONIOENCODING UTF-8  example  export PYTHONIOENCODING UTF-8  when write to stdout           exit 1    Update to reply to the comment   the problem just exist when piping to stdout   I tested in Fedora 25 Python 2 7 13  python --version Python 2 7 13   cat b py      usr bin env python  - - coding  utf-8 - - import sys  print sys stdout encoding   running   b py   UTF-8   running   b py   less  None

User · Answer

First  regarding this solution     - - coding  utf-8 - - print u         encode  utf-8     It s not practical to explicitly print with a given encoding every time  That would be repetitive and error-prone   A better solution is to change sys stdout at the start of your program  to encode with a selected encoding  Here is one solution I found on Python  How is sys stdout encoding chosen   in particular a comment by  toka    import sys import codecs sys stdout   codecs getwriter  utf8   sys stdout

User · Answer

export PYTHONIOENCODING utf-8   do the job  but can t set it on python itself      what we can do is verify if isn t setting and tell the user to set it before call script with    if   name         main         if  sys stdout encoding is None           print  gt  gt  sys stderr   please set python env PYTHONIOENCODING UTF-8  example  export PYTHONIOENCODING UTF-8  when write to stdout           exit 1    Update to reply to the comment   the problem just exist when piping to stdout   I tested in Fedora 25 Python 2 7 13  python --version Python 2 7 13   cat b py      usr bin env python  - - coding  utf-8 - - import sys  print sys stdout encoding   running   b py   UTF-8   running   b py   less  None

[python] Setting the correct encoding when piping stdout in Python

Examples related to python

Examples related to encoding

Examples related to terminal

Examples related to stdout

Examples related to python-2.x