[python] Python string to unicode

Possible Duplicate:
How do I treat an ASCII string as unicode and unescape the escaped characters in it in python?
How do convert unicode escape sequences to unicode characters in a python string

I have a string that contains unicode characters e.g. \u2026 etc. Somehow it is not received to me as unicode, but is received as a str. How do I convert it back to unicode?

>>> a="Hello\u2026"
>>> b=u"Hello\u2026"
>>> print a
Hello\u2026
>>> print b
Hello…
>>> print unicode(a)
Hello\u2026
>>> 

So clearly unicode(a) is not the answer. Then what is?

This question is related to python string unicode python-2.x python-unicode

The answer is


Unicode escapes only work in unicode strings, so this

 a="\u2026"

is actually a string of 6 characters: '\', 'u', '2', '0', '2', '6'.

To make unicode out of this, use decode('unicode-escape'):

a="\u2026"
print repr(a)
print repr(a.decode('unicode-escape'))

## '\\u2026'
## u'\u2026'

>>> a="Hello\u2026"
>>> print a.decode('unicode-escape')
Hello…

Decode it with the unicode-escape codec:

>>> a="Hello\u2026"
>>> a.decode('unicode-escape')
u'Hello\u2026'
>>> print _
Hello…

This is because for a non-unicode string the \u2026 is not recognised but is instead treated as a literal series of characters (to put it more clearly, 'Hello\\u2026'). You need to decode the escapes, and the unicode-escape codec can do that for you.

Note that you can get unicode to recognise it in the same way by specifying the codec argument:

>>> unicode(a, 'unicode-escape')
u'Hello\u2026'

But the a.decode() way is nicer.


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to string

How to split a string in two and store it in a field String method cannot be found in a main class method Kotlin - How to correctly concatenate a String Replacing a character from a certain index Remove quotes from String in Python Detect whether a Python string is a number or a letter How does String substring work in Swift How does String.Index work in Swift swift 3.0 Data to String? How to parse JSON string in Typescript

Examples related to unicode

How to resolve TypeError: can only concatenate str (not "int") to str (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape UnicodeEncodeError: 'ascii' codec can't encode character at special name Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) HTML for the Pause symbol in audio and video control Javascript: Unicode string to hex Concrete Javascript Regex for Accented Characters (Diacritics) Replace non-ASCII characters with a single space UTF-8 in Windows 7 CMD NameError: global name 'unicode' is not defined - in Python 3

Examples related to python-2.x

Combine several images horizontally with Python How to print variables without spaces between values How to return dictionary keys as a list in Python? How to add an element to the beginning of an OrderedDict? How to read a CSV file from a URL with Python? Malformed String ValueError ast.literal_eval() with String representation of Tuple Relative imports for the billionth time How do you use subprocess.check_output() in Python? write() versus writelines() and concatenated strings How to select a directory and store the location using tkinter in Python

Examples related to python-unicode

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c SyntaxError: Non-ASCII character '\xa3' in file when function returns '£' How to print Unicode character in Python? Python string to unicode UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128) Python - 'ascii' codec can't decode byte