[python] Decoding UTF-8 strings in Python

It's an encoding error - so if it's a unicode string, this ought to fix it:

text.encode("windows-1252").decode("utf-8")

If it's a plain string, you'll need an extra step:

text.decode("utf-8").encode("windows-1252").decode("utf-8")

Both of these will give you a unicode string.

By the way - to discover how a piece of text like this has been mangled due to encoding issues, you can use chardet:

>>> import chardet
>>> chardet.detect(u"And the Hip’s coming, too")
{'confidence': 0.5, 'encoding': 'windows-1252'}