UnicodeEncodeError charmap codec can t encode characters

Question

I m trying to scrape a website  but it gives me an error   I m using the following code   import urllib request from bs4 import BeautifulSoup  get   urllib request urlopen  https   www website com    html   get read    soup   BeautifulSoup html   print soup    And I m getting the following error   File  C  Python34 lib encodings cp1252 py   line 19  in encode     return codecs charmap encode input self errors encoding table  0  UnicodeEncodeError   charmap  codec can t encode characters in position 70924-70950  character maps to  lt undefined gt    What can I do to fix this

User · Answer

In Python 3 7  and running Windows 10 this worked  I am not sure whether it will work on other platforms and or other versions of Python   Replacing this line   with open  filename    w   as f   With this   with open  filename    w   encoding  utf-8   as f   The reason why it is working is because the encoding is changed to UTF-8 when using the file  so characters in UTF-8 are able to be converted to text  instead of returning an error when it encounters a UTF-8 character that is not suppord by the current encoding

User · Answer

I fixed it by adding  encode  utf-8   to soup   That means that print soup  becomes print soup encode  utf-8

User · Answer

While saving the response of get request  same error was thrown on Python 3 7 on window 10  The response received from the URL  encoding was UTF-8 so it is always recommended to check the encoding so same can be passed to avoid such trivial issue as it really kills lots of time in production  import requests resp   requests get  https   en wikipedia org wiki NIFTY 50   print resp encoding  with open   NiftyList txt    w   as f      f write resp text    When I added encoding  utf-8  with the open command it saved the file with the correct response   with open   NiftyList txt    w   encoding  utf-8   as f      f write resp text

User · Answer

set PYTHONIOENCODING utf-8 set PYTHONLEGACYWINDOWSSTDIO utf-8  You may or may not need to set that second environment variable PYTHONLEGACYWINDOWSSTDIO  Alternatively  this can be done in code  although it seems that doing it through env vars is recommended   sys stdin reconfigure encoding  utf-8   sys stdout reconfigure encoding  utf-8     Additionally  Reproducing this error was a bit of a pain  so leaving this here too in case you need to reproduce it on your machine  set PYTHONIOENCODING windows-1252 set PYTHONLEGACYWINDOWSSTDIO windows-1252

User · Answer

For those still getting this error  adding encode  utf-8   to soup will also fix this   soup   BeautifulSoup html doc   html parser   encode  utf-8   print soup

User · Answer

Even I faced the same issue with the encoding that occurs when you try to print it  read write it or open it  As others mentioned above adding  encoding  utf-8  will help if you are trying to print it       soup encode  utf-8     If you are trying to open scraped data and maybe write it into a file  then open the file with         encoding  utf-8       with open filename csv    w   newline    encoding  utf-8   as csv file

User · Answer

I was getting the same UnicodeEncodeError when saving scraped web content to a file  To fix it I replaced this code   with open fname   w   as f      f write html    with this   import io with io open fname   w   encoding  utf-8   as f      f write html    Using io gives you backward compatibility with Python 2   If you only need to support Python 3 you can use the builtin open function instead   with open fname   w   encoding  utf-8   as f      f write html

User · Answer

if you are using windows try to pass encoding  latin1   encoding  iso-8859-1  or encoding  cp1252  example  csv data   pd read csv csvpath encoding  iso-8859-1   print print soup encode  iso-8859-1

[python] UnicodeEncodeError: 'charmap' codec can't encode characters

Examples related to python

Examples related to beautifulsoup

Examples related to urllib