Process escape sequences in a string in Python

Question

Sometimes when I get input from a file or the user  I get a string with escape sequences in it  I would like to process the escape sequences in the same way that Python processes escape sequences in string literals   For example  let s say myString is defined as    gt  gt  gt  myString    spam  neggs   gt  gt  gt  print myString  spam neggs   I want a function  I ll call it process  that does this    gt  gt  gt  print process myString   spam eggs   It s important that the function can process all of the escape sequences in Python  listed in a table in the link above    Does Python have a function to do this

User · Answer

The ast literal eval function comes close  but it will expect the string to be properly quoted first   Of course Python s interpretation of backslash escapes depends on how the string is quoted     vs r   vs u    triple quotes  etc  so you may want to wrap the user input in suitable quotes and pass to literal eval  Wrapping it in quotes will also prevent literal eval from returning a number  tuple  dictionary  etc   Things still might get tricky if the user types unquoted quotes of the type you intend to wrap around the string

User · Answer

Below code should work for  n is required to be displayed on the string   import string  our str    The String is   n    n and   n   new str   string replace our str      n      n   1  print new str

User · Answer

This is a bad way of doing it  but it worked for me when trying to interpret escaped octals passed in a string argument   input string   eval  b     sys argv 1           It s worth mentioning that there is a difference between eval and ast literal eval  eval being way more unsafe   See Using python  39 s eval   vs  ast literal eval

User · Answer

The  currently  accepted answer by Jerub is correct for python2  but incorrect and may produce garbled results  as Apalala points out in a comment to that solution   for python3  That s because the unicode escape codec requires its source to be coded in latin-1  not utf-8  as per the official python docs  Hence  in python3 use   gt  gt  gt  myString  quot   p  m  n        x73 quot   gt  gt  gt  print myString    p  m n       x73  gt  gt  gt  decoded string   myString encode  latin-1   backslashreplace   decode  unicode escape    gt  gt  gt  print decoded string    p  m       s  This method also avoids the extra unnecessary roundtrip between strings and bytes in metatoaster s comments to Jerub s solution  but hats off to metatoaster for recognizing the bug in that solution

User · Answer

unicode escape doesn t work in general  It turns out that the string escape or unicode escape solution does not work in general -- particularly  it doesn t work in the presence of actual Unicode   If you can be sure that every non-ASCII character will be escaped  and remember  anything beyond the first 128 characters is non-ASCII   unicode escape will do the right thing for you  But if there are any literal non-ASCII characters already in your string  things will go wrong   unicode escape is fundamentally designed to convert bytes into Unicode text  But in many places -- for example  Python source code -- the source data is already Unicode text   The only way this can work correctly is if you encode the text into bytes first  UTF-8 is the sensible encoding for all text  so that should work  right   The following examples are in Python 3  so that the string literals are cleaner  but the same problem exists with slightly different manifestations on both Python 2 and 3    gt  gt  gt  s    na  ve   t test   gt  gt  gt  print s encode  utf-8   decode  unicode escape    na    ve   test   Well  that s wrong   The new recommended way to use codecs that decode text into text is to call codecs decode directly  Does that help    gt  gt  gt  import codecs  gt  gt  gt  print codecs decode s   unicode escape    na    ve   test   Not at all   Also  the above is a UnicodeError on Python 2    The unicode escape codec  despite its name  turns out to assume that all non-ASCII bytes are in the Latin-1  ISO-8859-1  encoding  So you would have to do it like this    gt  gt  gt  print s encode  latin-1   decode  unicode escape    na  ve    test   But that s terrible  This limits you to the 256 Latin-1 characters  as if Unicode had never been invented at all    gt  gt  gt  print  Erno   t Rubik  encode  latin-1   decode  unicode escape    UnicodeEncodeError   latin-1  codec can t encode character   u0151  in position 3  ordinal not in range 256    Adding a regular expression to solve the problem   Surprisingly  we do not now have two problems    What we need to do is only apply the unicode escape decoder to things that we are certain to be ASCII text  In particular  we can make sure only to apply it to valid Python escape sequences  which are guaranteed to be ASCII text   The plan is  we ll find escape sequences using a regular expression  and use a function as the argument to re sub to replace them with their unescaped value   import re import codecs  ESCAPE SEQUENCE RE   re compile r            U                8-digit hex escapes         u                4-digit hex escapes         x                2-digit hex escapes          0-7  1 3        Octal escapes         N                Unicode characters by name              abfnrtv     Single-character escapes           re UNICODE   re VERBOSE   def decode escapes s       def decode match match           return codecs decode match group 0    unicode-escape        return ESCAPE SEQUENCE RE sub decode match  s    And with that    gt  gt  gt  print decode escapes  Erno   t Rubik    Erno     Rubik

User · Answer

The correct thing to do is use the  string-escape  code to decode the string    gt  gt  gt  myString    spam  neggs   gt  gt  gt  decoded string   bytes myString   utf-8   decode  unicode escape     python3   gt  gt  gt  decoded string   myString decode  string escape     python2  gt  gt  gt  print decoded string  spam eggs   Don t use the AST or eval  Using the string codecs is much safer

User · Answer

The actually correct and convenient answer for python 3    gt  gt  gt  import codecs  gt  gt  gt  myString    spam  neggs   gt  gt  gt  print codecs escape decode bytes myString   utf-8    0  decode  utf-8    spam eggs  gt  gt  gt  myString    na  ve   t test   gt  gt  gt  print codecs escape decode bytes myString   utf-8    0  decode  utf-8    na  ve    test   Details regarding codecs escape decode    codecs escape decode is a bytes-to-bytes decoder codecs escape decode decodes ascii escape sequences  such as  b   n  -  b  n   b   xce  -  b  xce   codecs escape decode does not care or need to know about the byte object s encoding  but the encoding of the escaped bytes should match the encoding of the rest of the object    Background     rspeer is correct  unicode escape is the incorrect solution for python3  This is because unicode escape decodes escaped bytes  then decodes bytes to unicode string  but receives no information regarding which codec to use for the second operation   Jerub is correct  avoid the AST or eval  I first discovered codecs escape decode from this answer to  how do I  decode  string-escape   in Python3    As that answer states  that function is currently not documented for python 3

[python] Process escape sequences in a string in Python

Examples related to python

Examples related to string

Examples related to escaping