Given the Django use case, there are two answers to this. Here is its django.utils.html.escape
function, for reference:
def escape(html):
"""Returns the given HTML with ampersands, quotes and carets encoded."""
return mark_safe(force_unicode(html).replace('&', '&').replace('<', '&l
t;').replace('>', '>').replace('"', '"').replace("'", '''))
To reverse this, the Cheetah function described in Jake's answer should work, but is missing the single-quote. This version includes an updated tuple, with the order of replacement reversed to avoid symmetric problems:
def html_decode(s):
"""
Returns the ASCII decoded version of the given HTML string. This does
NOT remove normal HTML tags like <p>.
"""
htmlCodes = (
("'", '''),
('"', '"'),
('>', '>'),
('<', '<'),
('&', '&')
)
for code in htmlCodes:
s = s.replace(code[1], code[0])
return s
unescaped = html_decode(my_string)
This, however, is not a general solution; it is only appropriate for strings encoded with django.utils.html.escape
. More generally, it is a good idea to stick with the standard library:
# Python 2.x:
import HTMLParser
html_parser = HTMLParser.HTMLParser()
unescaped = html_parser.unescape(my_string)
# Python 3.x:
import html.parser
html_parser = html.parser.HTMLParser()
unescaped = html_parser.unescape(my_string)
# >= Python 3.5:
from html import unescape
unescaped = unescape(my_string)
As a suggestion: it may make more sense to store the HTML unescaped in your database. It'd be worth looking into getting unescaped results back from BeautifulSoup if possible, and avoiding this process altogether.
With Django, escaping only occurs during template rendering; so to prevent escaping you just tell the templating engine not to escape your string. To do that, use one of these options in your template:
{{ context_var|safe }}
{% autoescape off %}
{{ context_var }}
{% endautoescape %}