Decoding UTF-8 strings in Python

Question

I m writing a web crawler in python  and it involves taking headlines from websites   One of the headlines should ve read   And the Hip s coming  too  But instead it said  And the Hip        s coming  too  What s going wrong here

User · Accepted Answer

You need to properly decode the source text. Most likely the source text is in UTF-8 format, not ASCII.

Because you do not provide any context or code for your question it is not possible to give a direct answer.

I suggest you study how unicode and character encoding is done in Python:

http://docs.python.org/2/howto/unicode.html

User · Answer

It s an encoding error -  so if it s a unicode string  this ought to fix it   text encode  windows-1252   decode  utf-8     If it s a plain string  you ll need an extra step   text decode  utf-8   encode  windows-1252   decode  utf-8     Both of these will give you a unicode string   By the way - to discover how a piece of text like this has been mangled due to encoding issues  you can use chardet    gt  gt  gt  import chardet  gt  gt  gt  chardet detect u And the Hip        s coming  too     confidence   0 5   encoding    windows-1252

[python] Decoding UTF-8 strings in Python

Examples related to python

Examples related to python-2.7