My computer had the wrong locale set.
I first did
>>> import locale
>>> locale.getpreferredencoding(False)
'ANSI_X3.4-1968'
locale.getpreferredencoding(False)
is the function called by open()
when you don't provide an encoding. The output should be 'UTF-8'
, but in this case it's some variant of ASCII.
Then I ran the bash command locale
and got this output
$ locale
LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
So, I was using the default Ubuntu locale, which causes Python to open files as ASCII instead of UTF-8. I had to set my locale to en_US.UTF-8
sudo apt install locales
sudo locale-gen en_US en_US.UTF-8
sudo dpkg-reconfigure locales
If you can't change the locale system wide, you can invoke all your Python code like this:
PYTHONIOENCODING="UTF-8" python3 ./path/to/your/script.py
or do
export PYTHONIOENCODING="UTF-8"
to set it in the shell you run that in.