UnicodeDecodeError charmap codec can t decode byte X in position Y character maps to undefined

Question

I m trying to get a Python 3 program to do some manipulations with a text file filled with information  However  when trying to read the file I get the following error    Traceback  most recent call last          File  SCRIPT LOCATION   line NUMBER  in  lt module gt          text   file read           File  C  Python31 lib encodings cp1252 py   line 23  in decode         return codecs charmap decode input self errors decoding table  0          UnicodeDecodeError   charmap  codec can t decode byte 0x90 in position 2907500  character maps to   lt undefined gt

User · Answer

Before you apply the suggested solution  you can check what is the Unicode character that appeared in your file  and in the error log   in this case 0x90  https   unicodelookup com  0x90 1  or directly at Unicode Consortium site http   www unicode org charts  by searching 0x0090  and then consider removing it from the file

User · Answer

For those working in Anaconda in Windows  I had the same problem  Notepad   help me to solve it   Open the file in Notepad    In the bottom right it will tell you the current file encoding  In the top menu  next to  View  locate  Encoding   In  Encoding  go to  character sets  and there with patiente look for the enconding that you need  In my case the encoding  Windows-1252  was found under  Western European

User · Answer

for me encoding with utf16 worked file   open  filename csv   encoding  quot utf16 quot

User · Answer

for me changing the Mysql character encoding the same as my code helped to sort out the solution   photo open  pic3 png  encoding latin1    strong text

User · Answer

The file in question is not using the CP1252 encoding  It s using another encoding  Which one you have to figure out yourself  Common ones are Latin-1 and UTF-8  Since 0x90 doesn t actually mean anything in Latin-1  UTF-8  where 0x90 is a continuation byte  is more likely   You specify the encoding when you open the file   file   open filename  encoding  utf8

User · Answer

TLDR   Try  file   open filename  encoding  cp437   Why  When one use   file   open filename  text   file read     Python assumes the file uses the same codepage as current environment  cp1252 in case of the opening post  and tries to decode it to its own default UTF-8  If the file contains characters of values not defined in this codepage  like 0x90  we get UnicodeDecodeError  Sometimes we don t know the encoding of the file  sometimes the file s encoding may be unhandled by Python  like e g  cp790   sometimes the file can contain mixed encodings   If such characters are unneeded  one may decide to replace them by question marks  with   file   open filename  errors  replace     Another workaround is to use   file   open filename  errors  ignore     The characters are then left intact  but other errors will be masked too   Quite good solution is to specify the encoding  yet not any encoding  like cp1252   but the one which has ALL characters defined  like cp437    file   open filename  encoding  cp437     Codepage 437 is the original DOS encoding  All codes are defined  so there are no errors while reading the file  no errors are masked out  the characters are preserved  not quite left intact but still distinguishable

User · Answer

As an extension to  LennartRegebro s answer   If you can t tell what encoding your file uses and the solution above does not work  it s not utf8  and you found yourself merely guessing - there are online tools that you could use to identify what encoding that is  They aren t perfect but usually work just fine  After you figure out the encoding you should be able to use solution above   EDIT   Copied from comment   A quite popular text editor Sublime Text has a command to display encoding if it has been set      Go to View -  Show Console  or Ctrl          Type into field at the bottom view encoding   and hope for the best  I was unable to get anything but Undefined but maybe you will have better luck

User · Answer

Stop wasting your time  just add the following encoding  cp437  and errors  ignore  to your code in both read and write   open  filename csv   encoding  cp437   errors  ignore   open file name   w   newline     encoding  cp437   errors  ignore     Godspeed

User · Answer

Alternatively if you don t need to decode the file  such as uploading the file to a website  open filename   rb   where r   reading  b   binary

User · Answer

If file   open filename  encoding  quot utf8 quot   doesn t work  try file   open filename  errors  quot ignore quot    if you want to remove unneeded characters

[windows] UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>

Examples related to windows

Examples related to python-3.x

Examples related to unicode

Examples related to file-io

Examples related to decode