UnicodeDecodeError utf8 codec can t decode byte 0xa5 in position 0 invalid start byte

Question

I am using Python-2 6 CGI scripts but found this error in server log while doing json dumps    Traceback  most recent call last     File  quot  etc mongodb server cgi-bin getstats py quot   line 135  in  lt module gt      print json dumps     get data      File  quot  usr lib python2 7 json   init   py quot   line 231  in dumps     return  default encoder encode obj    File  quot  usr lib python2 7 json encoder py quot   line 201  in encode     chunks   self iterencode o   one shot True    File  quot  usr lib python2 7 json encoder py quot   line 264  in iterencode     return  iterencode o  0  UnicodeDecodeError   utf8  codec can t decode byte 0xa5 in position 0  invalid start byte   Here      get data   function returns dictionary      Before posting this question I have referred this of question os SO   UPDATES Following line is hurting JSON encoder  now   datetime datetime now   now   datetime datetime strftime now    Y- m- dT H  M  S  fZ   print json dumps   current time   now     this is the culprit  I got a temporary fix for it print json dumps    old time   now encode  ISO-8859-1   strip       But I am not sure is it correct way to do it

User · Answer

You may use any standard encoding of your specific usage and input   utf-8 is the default   iso8859-1 is also popular for Western Europe   e g  bytes obj decode  iso8859-1     see  docs

User · Answer

After trying all the aforementioned workarounds  if it still throws the same error  you can try exporting the file as CSV  a second time if you already have   Especially if you re using scikit learn  it is best to import the dataset as a CSV file  I spent hours together  whereas the solution was this simple  Export the file as a CSV to the directory where Anaconda or your classifier tools are installed and try

User · Answer

The error is because there is some non-ascii character in the dictionary and it can t be encoded decoded  One simple way to avoid this error is to encode such strings with encode   function as follows  if a is the string with non-ascii character    a encode  utf-8   strip

User · Answer

Inspired by  aaronpenne and  Soumyaansh  f   open  file txt    rb   text   f read   decode errors  replace

User · Answer

Instead of looking for ways to decode a5  Yen     or 96  en-dash       tell MySQL that your client is encoded  latin1   but you want  utf8  in the database   See details in Trouble with UTF-8 characters  what I see is not what I stored

User · Answer

On read csv  I added an encoding method   import pandas as pd dataset   pd read csv  sample data csv   header  0                          encoding   unicode escape

User · Answer

This solution worked for me   import pandas as pd data   pd read csv  training csv   encoding    unicode escape

User · Answer

In my case  i had to save the file as UTF8 with BOM not just as UTF8 utf8 then this error was gone

User · Answer

As of 2018-05 this is handled directly with decode  at least for Python 3    I m using the below snippet for invalid start byte and invalid continuation byte type errors  Adding errors  ignore  fixed it for me   with open out file   rb   as f      for line in f          print line decode errors  ignore

User · Answer

Set default encoder at the top of your code  import sys reload sys  sys setdefaultencoding  ISO-8859-1

User · Answer

Following line is hurting JSON encoder   now   datetime datetime now   now   datetime datetime strftime now    Y- m- dT H  M  S  fZ   print json dumps   current time   now      this is the culprit   I got a temporary fix for it  print json dumps    old time   now encode  ISO-8859-1   strip        Marking this as correct as a temporary fix  Not sure so

User · Answer

Your string has a non ascii character encoded in it   Not being able to decode with utf-8 may happen if you ve needed to use other encodings in your code  For example    gt  gt  gt   my weird character  x96  decode  utf-8   Traceback  most recent call last     File   lt stdin gt    line 1  in  lt module gt    File  C  Python27 lib encodings utf 8 py   line 16  in decode     return codecs utf 8 decode input  errors  True  UnicodeDecodeError   utf8  codec can t decode byte 0x96 in position 19  invalid start byte   In this case  the encoding is windows-1252 so you have to do    gt  gt  gt   my weird character  x96  decode  windows-1252   u my weird character  u2013    Now that you have Unicode  you can safely encode into utf-8

User · Answer

If the above methods are not working for you  you may want to look into changing the encoding of the csv file itself  Using Excel   Open csv file using Excel Navigate to File menu option and click Save As Click Browse to select a location to save the file Enter intended filename Select CSV  Comma delimited     csv  option Click Tools drop-down box and click Web Options Under Encoding tab  select the option Unicode  UTF-8  from Save this document as drop-down list Save the file  Using Notepad   Open csv file using notepad Navigate to File  gt  Save As option Next  select the location to the file Select the Save as type option as All Files    Specify the file name with  csv extension From Encoding drop-down list  select UTF-8 option  Click Save to save the file  By doing this  you should be able to import csv files without encountering the UnicodeCodeError

User · Answer

The following snippet worked for me  import pandas as pd df   pd read csv filename  sep        encoding    latin1   error bad lines False   error bad lines is avoid single line error

User · Answer

I switched this simply by defining a different codec package in the read csv   command   encoding    unicode escape   Eg   import pandas as pd data   pd read csv filename  encoding   unicode escape

User · Answer

Try the below code snippet   with open path   rb   as f    text   f read

User · Answer

Simple Solution   import pandas as pd df   pd read csv  file name csv   engine  python

User · Answer

from io import BytesIO  df   pd read excel BytesIO bytes content   engine  openpyxl    worked for me

[python] UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte

Examples related to python

Examples related to json