Unicode string literals (string literals prefixed by u
) are no longer used in Python 3. They are still valid but just for compatibility purposes with Python 2.
If you want to create a string literal consisting of only easily typable characters like english letters or numbers, you can simply type them: 'hello world'
. But if you want to include also some more exotic characters, you'll have to use some workaround. One of the workarounds are Escape sequences. This way you can for example represent a new line in your string simply by adding two easily typable characters \n
to your string literal. So when you print the 'hello\nworld'
string, the words will be printed on separate lines. That's very handy!
On the other hand, there are some situations when you want to create a string literal that contains escape sequences but you don't want them to be interpreted by Python. You want them to be raw. Look at these examples:
'New updates are ready in c:\windows\updates\new'
'In this lesson we will learn what the \n escape sequence does.'
In such situations you can just prefix the string literal with the r
character like this: r'hello\nworld'
and no escape sequences will be interpreted by Python. The string will be printed exactly as you created it.
Many people expect the raw string literals to be raw in a sense that "anything placed between the quotes is ignored by Python". That is not true. Python still recognizes all the escape sequences, it just does not interpret them - it leaves them unchanged instead. It means that raw string literals still have to be valid string literals.
From the lexical definition of a string literal:
string ::= "'" stringitem* "'"
stringitem ::= stringchar | escapeseq
stringchar ::= <any source character except "\" or newline or the quote>
escapeseq ::= "\" <any source character>
It is clear that string literals (raw or not) containing a bare quote character: 'hello'world'
or ending with a backslash: 'hello world\'
are not valid.