There are a few points to consider.
A \u2018 character may appear only as a fragment of representation of a unicode string in Python, e.g. if you write:
>>> text = u'‘'
>>> print repr(text)
u'\u2018'
Now if you simply want to print the unicode string prettily, just use unicode's encode
method:
>>> text = u'I don\u2018t like this'
>>> print text.encode('utf-8')
I don‘t like this
To make sure that every line from any file would be read as unicode, you'd better use the codecs.open
function instead of just open
, which allows you to specify file's encoding:
>>> import codecs
>>> f1 = codecs.open(file1, "r", "utf-8")
>>> text = f1.read()
>>> print type(text)
<type 'unicode'>
>>> print text.encode('utf-8')
I don‘t like this