Your terminal happens to be configured to UTF-8.
The fact that printing a
works is a coincidence; you are writing raw UTF-8 bytes to the terminal. a
is a value of length two, containing two bytes, hex values C3 and A1, while ua
is a unicode value of length one, containing a codepoint U+00E1.
This difference in length is one major reason to use Unicode values; you cannot easily measure the number of text characters in a byte string; the len()
of a byte string tells you how many bytes were used, not how many characters were encoded.
You can see the difference when you encode the unicode value to different output encodings:
>>> a = 'á'
>>> ua = u'á'
>>> ua.encode('utf8')
'\xc3\xa1'
>>> ua.encode('latin1')
'\xe1'
>>> a
'\xc3\xa1'
Note that the first 256 codepoints of the Unicode standard match the Latin 1 standard, so the U+00E1 codepoint is encoded to Latin 1 as a byte with hex value E1.
Furthermore, Python uses escape codes in representations of unicode and byte strings alike, and low code points that are not printable ASCII are represented using \x..
escape values as well. This is why a Unicode string with a code point between 128 and 255 looks just like the Latin 1 encoding. If you have a unicode string with codepoints beyond U+00FF a different escape sequence, \u....
is used instead, with a four-digit hex value.
It looks like you don't yet fully understand what the difference is between Unicode and an encoding. Please do read the following articles before you continue: