[python] Why should we NOT use sys.setdefaultencoding("utf-8") in a py script?

  • The first danger lies in reload(sys).

    When you reload a module, you actually get two copies of the module in your runtime. The old module is a Python object like everything else, and stays alive as long as there are references to it. So, half of the objects will be pointing to the old module, and half to the new one. When you make some change, you will never see it coming when some random object doesn't see the change:

    (This is IPython shell)
    
    In [1]: import sys
    
    In [2]: sys.stdout
    Out[2]: <colorama.ansitowin32.StreamWrapper at 0x3a2aac8>
    
    In [3]: reload(sys)
    <module 'sys' (built-in)>
    
    In [4]: sys.stdout
    Out[4]: <open file '<stdout>', mode 'w' at 0x00000000022E20C0>
    
    In [11]: import IPython.terminal
    
    In [14]: IPython.terminal.interactiveshell.sys.stdout
    Out[14]: <colorama.ansitowin32.StreamWrapper at 0x3a9aac8>
    
  • Now, sys.setdefaultencoding() proper

    All that it affects is implicit conversion str<->unicode. Now, utf-8 is the sanest encoding on the planet (backward-compatible with ASCII and all), the conversion now "just works", what could possibly go wrong?

    Well, anything. And that is the danger.

    • There may be some code that relies on the UnicodeError being thrown for non-ASCII input, or does the transcoding with an error handler, which now produces an unexpected result. And since all code is tested with the default setting, you're strictly on "unsupported" territory here, and no-one gives you guarantees about how their code will behave.
    • The transcoding may produce unexpected or unusable results if not everything on the system uses UTF-8 because Python 2 actually has multiple independent "default string encodings". (Remember, a program must work for the customer, on the customer's equipment.)
      • Again, the worst thing is you will never know that because the conversion is implicit -- you don't really know when and where it happens. (Python Zen, koan 2 ahoy!) You will never know why (and if) your code works on one system and breaks on another. (Or better yet, works in IDE and breaks in console.)

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to encoding

How to check encoding of a CSV file UnicodeEncodeError: 'ascii' codec can't encode character at special name Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? The character encoding of the plain text document was not declared - mootool script UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) How to encode text to base64 in python UTF-8 output from PowerShell Set Encoding of File to UTF8 With BOM in Sublime Text 3 Replace non-ASCII characters with a single space

Examples related to utf-8

error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte Changing PowerShell's default output encoding to UTF-8 'Malformed UTF-8 characters, possibly incorrectly encoded' in Laravel Encoding Error in Panda read_csv Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings What is the difference between utf8mb4 and utf8 charsets in MySQL? what is <meta charset="utf-8">? Pandas df.to_csv("file.csv" encode="utf-8") still gives trash characters for minus sign UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 23: ordinal not in range(128) Android Studio : unmappable character for encoding UTF-8

Examples related to python-2.x

Combine several images horizontally with Python How to print variables without spaces between values How to return dictionary keys as a list in Python? How to add an element to the beginning of an OrderedDict? How to read a CSV file from a URL with Python? Malformed String ValueError ast.literal_eval() with String representation of Tuple Relative imports for the billionth time How do you use subprocess.check_output() in Python? write() versus writelines() and concatenated strings How to select a directory and store the location using tkinter in Python

Examples related to sys

Get parent of current directory from Python script python: sys is not defined Python memory usage of numpy arrays Usage of sys.stdout.flush() method Why should we NOT use sys.setdefaultencoding("utf-8") in a py script? Where is Python's sys.path initialized from?