How do I perform HTML decoding encoding using Python Django

Question

I have a string that is HTML encoded        amp lt img class  amp quot size-medium wp-image-113 amp quot    style  amp quot margin-left  15px  amp quot  title  amp quot su1 amp quot    src  amp quot http   blah org wp-content uploads 2008 10 su1-300x194 jpg amp quot    alt  amp quot  amp quot  width  amp quot 300 amp quot  height  amp quot 194 amp quot    amp gt       I want to change that to    lt img class  size-medium wp-image-113  style  margin-left  15px      title  su1  src  http   blah org wp-content uploads 2008 10 su1-300x194 jpg     alt    width  300  height  194    gt     I want this to register as HTML so that it is rendered as an image by the browser instead of being displayed as text    The string is stored like that because I am using a web-scraping tool called BeautifulSoup  it  scans  a web-page and gets certain content from it  then returns the string in that format   I ve found how to do this in C  but not in Python  Can someone help me out   Related   Convert XML HTML Entities into Unicode String in Python

User · Answer

I found this in the Cheetah source code (here)

htmlCodes = [
    ['&', '&amp;'],
    ['<', '&lt;'],
    ['>', '&gt;'],
    ['"', '&quot;'],
]
htmlCodesReversed = htmlCodes[:]
htmlCodesReversed.reverse()
def htmlDecode(s, codes=htmlCodesReversed):
    """ Returns the ASCII decoded version of the given HTML string. This does
        NOT remove normal HTML tags like <p>. It is the inverse of htmlEncode()."""
    for code in codes:
        s = s.replace(code[1], code[0])
    return s

not sure why they reverse the list, I think it has to do with the way they encode, so with you it may not need to be reversed. Also if I were you I would change htmlCodes to be a list of tuples rather than a list of lists... this is going in my library though :)

i noticed your title asked for encode too, so here is Cheetah's encode function.

def htmlEncode(s, codes=htmlCodes):
    """ Returns the HTML encoded version of the given string. This is useful to
        display a plain ASCII text string on a web page."""
    for code in codes:
        s = s.replace(code[0], code[1])
    return s

User · Answer

For html encoding  there s cgi escape from the standard library    gt  gt  help cgi escape  cgi escape   escape s  quote None      Replace special characters   amp      lt   and   gt   to HTML-safe sequences      If the optional flag quote is true  the quotation mark character         is also translated    For html decoding  I use the following   import re from htmlentitydefs import name2codepoint   for some reason  python 2 5 2 doesn t have this one  apostrophe  name2codepoint   39     39  def unescape s        unescape HTML code refs  c f  http   wiki python org moin EscapingHtml      return re sub   amp   s          join name2codepoint                 lambda m  unichr name2codepoint m group 1     s    For anything more complicated  I use BeautifulSoup

User · Answer

In Python 3 4    import html  html unescape your string

User · Answer

If anyone is looking for a simple way to do this via the django templates  you can always use filters like this   lt html gt     node description safe     lt  html gt   I had some data coming from a vendor and everything I posted had html tags actually written on the rendered page as if you were looking at the source

User · Answer

Daniel s comment as an answer    escaping only occurs in Django during template rendering  Therefore  there s no need for an unescape - you just tell the templating engine not to escape  either    context var safe    or    autoescape off      context var      endautoescape

User · Answer

Use daniel s solution if the set of encoded characters is relatively restricted  Otherwise  use one of the numerous HTML-parsing libraries   I like BeautifulSoup because it can handle malformed XML HTML    http   www crummy com software BeautifulSoup   for your question  there s an example in their documentation   from BeautifulSoup import BeautifulStoneSoup BeautifulStoneSoup  Sacr amp eacute  bl amp  101 u                        convertEntities BeautifulStoneSoup HTML ENTITIES  contents 0    u Sacr xe9 bleu

User · Answer

Given the Django use case  there are two answers to this   Here is its django utils html escape function  for reference   def escape html          Returns the given HTML with ampersands  quotes and carets encoded         return mark safe force unicode html  replace   amp      amp amp    replace   lt      amp l t    replace   gt      amp gt    replace        amp quot    replace        amp  39       To reverse this  the Cheetah function described in Jake s answer should work  but is missing the single-quote   This version includes an updated tuple  with the order of replacement reversed to avoid symmetric problems   def html decode s               Returns the ASCII decoded version of the given HTML string  This does     NOT remove normal HTML tags like  lt p gt               htmlCodes                         amp  39                         amp quot                    gt      amp gt                    lt      amp lt                    amp      amp amp                  for code in htmlCodes          s   s replace code 1   code 0       return s  unescaped   html decode my string    This  however  is not a general solution  it is only appropriate for strings encoded with django utils html escape   More generally  it is a good idea to stick with the standard library     Python 2 x  import HTMLParser html parser   HTMLParser HTMLParser   unescaped   html parser unescape my string     Python 3 x  import html parser html parser   html parser HTMLParser   unescaped   html parser unescape my string      gt   Python 3 5  from html import unescape unescaped   unescape my string    As a suggestion  it may make more sense to store the HTML unescaped in your database   It d be worth looking into getting unescaped results back from BeautifulSoup if possible  and avoiding this process altogether   With Django  escaping only occurs during template rendering  so to prevent escaping you just tell the templating engine not to escape your string   To do that  use one of these options in your template      context var safe       autoescape off           context var       endautoescape

User · Answer

For html encoding  there s cgi escape from the standard library    gt  gt  help cgi escape  cgi escape   escape s  quote None      Replace special characters   amp      lt   and   gt   to HTML-safe sequences      If the optional flag quote is true  the quotation mark character         is also translated    For html decoding  I use the following   import re from htmlentitydefs import name2codepoint   for some reason  python 2 5 2 doesn t have this one  apostrophe  name2codepoint   39     39  def unescape s        unescape HTML code refs  c f  http   wiki python org moin EscapingHtml      return re sub   amp   s          join name2codepoint                 lambda m  unichr name2codepoint m group 1     s    For anything more complicated  I use BeautifulSoup

User · Answer

Use daniel s solution if the set of encoded characters is relatively restricted  Otherwise  use one of the numerous HTML-parsing libraries   I like BeautifulSoup because it can handle malformed XML HTML    http   www crummy com software BeautifulSoup   for your question  there s an example in their documentation   from BeautifulSoup import BeautifulStoneSoup BeautifulStoneSoup  Sacr amp eacute  bl amp  101 u                        convertEntities BeautifulStoneSoup HTML ENTITIES  contents 0    u Sacr xe9 bleu

User · Answer

With the standard library    HTML Escape  try      from html import escape    python 3 x except ImportError      from cgi import escape    python 2 x  print escape   lt      HTML Unescape  try      from html import unescape    python 3 4  except ImportError      try          from html parser import HTMLParser    python 3 x   lt 3 4      except ImportError          from HTMLParser import HTMLParser    python 2 x     unescape   HTMLParser   unescape  print unescape   amp gt

User · Answer

Given the Django use case  there are two answers to this   Here is its django utils html escape function  for reference   def escape html          Returns the given HTML with ampersands  quotes and carets encoded         return mark safe force unicode html  replace   amp      amp amp    replace   lt      amp l t    replace   gt      amp gt    replace        amp quot    replace        amp  39       To reverse this  the Cheetah function described in Jake s answer should work  but is missing the single-quote   This version includes an updated tuple  with the order of replacement reversed to avoid symmetric problems   def html decode s               Returns the ASCII decoded version of the given HTML string  This does     NOT remove normal HTML tags like  lt p gt               htmlCodes                         amp  39                         amp quot                    gt      amp gt                    lt      amp lt                    amp      amp amp                  for code in htmlCodes          s   s replace code 1   code 0       return s  unescaped   html decode my string    This  however  is not a general solution  it is only appropriate for strings encoded with django utils html escape   More generally  it is a good idea to stick with the standard library     Python 2 x  import HTMLParser html parser   HTMLParser HTMLParser   unescaped   html parser unescape my string     Python 3 x  import html parser html parser   html parser HTMLParser   unescaped   html parser unescape my string      gt   Python 3 5  from html import unescape unescaped   unescape my string    As a suggestion  it may make more sense to store the HTML unescaped in your database   It d be worth looking into getting unescaped results back from BeautifulSoup if possible  and avoiding this process altogether   With Django  escaping only occurs during template rendering  so to prevent escaping you just tell the templating engine not to escape your string   To do that  use one of these options in your template      context var safe       autoescape off           context var       endautoescape

User · Answer

Searching the simplest solution of this question in Django and Python I found you can use builtin theirs functions to escape unescape html code   Example  I saved your html code in scraped html and clean html   scraped html           amp lt img class  amp quot size-medium wp-image-113 amp quot         style  amp quot margin-left  15px  amp quot  title  amp quot su1 amp quot         src  amp quot http   blah org wp-content uploads 2008 10 su1-300x194 jpg amp quot         alt  amp quot  amp quot  width  amp quot 300 amp quot  height  amp quot 194 amp quot    amp gt     clean html           lt img class  size-medium wp-image-113  style  margin-left  15px          title  su1  src  http   blah org wp-content uploads 2008 10 su1-300x194 jpg         alt    width  300  height  194    gt       Django  You need Django    1 0  unescape  To unescape your scraped html code you can use django utils text unescape entities which      Convert all named and numeric character references to the corresponding unicode characters     gt  gt  gt  from django utils text import unescape entities  gt  gt  gt  clean html    unescape entities scraped html  True   escape  To escape your clean html code you can use django utils html escape which      Returns the given text with ampersands  quotes and angle brackets encoded for use in HTML     gt  gt  gt  from django utils html import escape  gt  gt  gt  scraped html    escape clean html  True   Python  You need Python    3 4  unescape  To unescape your scraped html code you can use html unescape which      Convert all named and numeric character references  e g   amp gt    amp  62    amp x3e   in the string s to the corresponding unicode characters     gt  gt  gt  from html import unescape  gt  gt  gt  clean html    unescape scraped html  True   escape  To escape your clean html code you can use html escape which      Convert the characters  amp    lt  and  gt  in string s to HTML-safe sequences     gt  gt  gt  from html import escape  gt  gt  gt  scraped html    escape clean html  True

User · Answer

You can also use django utils html escape  from django utils html import escape  something nice   escape request POST  something naughty

User · Answer

Daniel s comment as an answer    escaping only occurs in Django during template rendering  Therefore  there s no need for an unescape - you just tell the templating engine not to escape  either    context var safe    or    autoescape off      context var      endautoescape

User · Answer

Below is a python function that uses module htmlentitydefs   It is not perfect   The version of htmlentitydefs that I have is incomplete and it assumes that all entities decode to one codepoint which is wrong for entities like  amp NotEqualTilde    http   www w3 org TR html5 named-character-references html   NotEqualTilde      U 02242 U 00338          With those caveats though  here s the code   def decodeHtmlText html               Given a string of HTML that would parse to a single text node      return the text value of that node                Fast path for common case      if html find   amp     lt  0  return html     return re sub            amp        x  0-9A-Fa-f      0-9       a-zA-Z0-9                 decode html entity          html   def  decode html entity match               Regex replacer that expects hex digits in group 1  or     decimal digits in group 2  or a named entity in group 3              hex digits   match group 1       amp  10   - gt  unichr 10      if hex digits  return unichr int hex digits  16       decimal digits   match group 2       amp  x10   - gt  unichr 0x10      if decimal digits  return unichr int decimal digits  10       name   match group 3     name is  lt  when   amp lt   was matched      if name          decoding    htmlentitydefs name2codepoint get name                Treat  amp GT  like  amp gt                 This is wrong for  amp Gt  and  amp Lt  which HTML5 adopted from MathML                If htmlentitydefs included mappings for those entities                then this code will magically work              or htmlentitydefs name2codepoint get name lower             if decoding is not None  return unichr decoding      return match group 0     Treat   amp noSuchEntity   as   amp noSuchEntity

User · Answer

Searching the simplest solution of this question in Django and Python I found you can use builtin theirs functions to escape unescape html code   Example  I saved your html code in scraped html and clean html   scraped html           amp lt img class  amp quot size-medium wp-image-113 amp quot         style  amp quot margin-left  15px  amp quot  title  amp quot su1 amp quot         src  amp quot http   blah org wp-content uploads 2008 10 su1-300x194 jpg amp quot         alt  amp quot  amp quot  width  amp quot 300 amp quot  height  amp quot 194 amp quot    amp gt     clean html           lt img class  size-medium wp-image-113  style  margin-left  15px          title  su1  src  http   blah org wp-content uploads 2008 10 su1-300x194 jpg         alt    width  300  height  194    gt       Django  You need Django    1 0  unescape  To unescape your scraped html code you can use django utils text unescape entities which      Convert all named and numeric character references to the corresponding unicode characters     gt  gt  gt  from django utils text import unescape entities  gt  gt  gt  clean html    unescape entities scraped html  True   escape  To escape your clean html code you can use django utils html escape which      Returns the given text with ampersands  quotes and angle brackets encoded for use in HTML     gt  gt  gt  from django utils html import escape  gt  gt  gt  scraped html    escape clean html  True   Python  You need Python    3 4  unescape  To unescape your scraped html code you can use html unescape which      Convert all named and numeric character references  e g   amp gt    amp  62    amp x3e   in the string s to the corresponding unicode characters     gt  gt  gt  from html import unescape  gt  gt  gt  clean html    unescape scraped html  True   escape  To escape your clean html code you can use html escape which      Convert the characters  amp    lt  and  gt  in string s to HTML-safe sequences     gt  gt  gt  from html import escape  gt  gt  gt  scraped html    escape clean html  True

User · Answer

Below is a python function that uses module htmlentitydefs   It is not perfect   The version of htmlentitydefs that I have is incomplete and it assumes that all entities decode to one codepoint which is wrong for entities like  amp NotEqualTilde    http   www w3 org TR html5 named-character-references html   NotEqualTilde      U 02242 U 00338          With those caveats though  here s the code   def decodeHtmlText html               Given a string of HTML that would parse to a single text node      return the text value of that node                Fast path for common case      if html find   amp     lt  0  return html     return re sub            amp        x  0-9A-Fa-f      0-9       a-zA-Z0-9                 decode html entity          html   def  decode html entity match               Regex replacer that expects hex digits in group 1  or     decimal digits in group 2  or a named entity in group 3              hex digits   match group 1       amp  10   - gt  unichr 10      if hex digits  return unichr int hex digits  16       decimal digits   match group 2       amp  x10   - gt  unichr 0x10      if decimal digits  return unichr int decimal digits  10       name   match group 3     name is  lt  when   amp lt   was matched      if name          decoding    htmlentitydefs name2codepoint get name                Treat  amp GT  like  amp gt                 This is wrong for  amp Gt  and  amp Lt  which HTML5 adopted from MathML                If htmlentitydefs included mappings for those entities                then this code will magically work              or htmlentitydefs name2codepoint get name lower             if decoding is not None  return unichr decoding      return match group 0     Treat   amp noSuchEntity   as   amp noSuchEntity

User · Answer

With the standard library    HTML Escape  try      from html import escape    python 3 x except ImportError      from cgi import escape    python 2 x  print escape   lt      HTML Unescape  try      from html import unescape    python 3 4  except ImportError      try          from html parser import HTMLParser    python 3 x   lt 3 4      except ImportError          from HTMLParser import HTMLParser    python 2 x     unescape   HTMLParser   unescape  print unescape   amp gt

User · Answer

I found a fine function at  http   snippets dzone com posts show 4569  def decodeHtmlentities string       import re     entity re   re compile   amp       d 1 5   w 1 8           def substitute entity match           from htmlentitydefs import name2codepoint as n2cp         ent   match group 2          if match group 1                      return unichr int ent           else              cp   n2cp get ent               if cp                  return unichr cp              else                  return match group        return entity re subn substitute entity  string  0

User · Answer

See at the bottom of this page at Python wiki  there are at least 2 options to  unescape  html

User · Answer

I found this in the Cheetah source code (here)

htmlCodes = [
    ['&', '&amp;'],
    ['<', '&lt;'],
    ['>', '&gt;'],
    ['"', '&quot;'],
]
htmlCodesReversed = htmlCodes[:]
htmlCodesReversed.reverse()
def htmlDecode(s, codes=htmlCodesReversed):
    """ Returns the ASCII decoded version of the given HTML string. This does
        NOT remove normal HTML tags like <p>. It is the inverse of htmlEncode()."""
    for code in codes:
        s = s.replace(code[1], code[0])
    return s

not sure why they reverse the list, I think it has to do with the way they encode, so with you it may not need to be reversed. Also if I were you I would change htmlCodes to be a list of tuples rather than a list of lists... this is going in my library though :)

i noticed your title asked for encode too, so here is Cheetah's encode function.

def htmlEncode(s, codes=htmlCodes):
    """ Returns the HTML encoded version of the given string. This is useful to
        display a plain ASCII text string on a web page."""
    for code in codes:
        s = s.replace(code[0], code[1])
    return s

User · Answer

Even though this is a really old question  this may work   Django 1 5 5  In  1   from django utils text import unescape entities In  2   unescape entities   amp lt img class  amp quot size-medium wp-image-113 amp quot  style  amp quot margin-left  15px  amp quot  title  amp quot su1 amp quot  src  amp quot http   blah org wp-content uploads 2008 10 su1-300x194 jpg amp quot  alt  amp quot  amp quot  width  amp quot 300 amp quot  height  amp quot 194 amp quot    amp gt    Out 2   u  lt img class  size-medium wp-image-113  style  margin-left  15px   title  su1  src  http   blah org wp-content uploads 2008 10 su1-300x194 jpg  alt    width  300  height  194    gt

User · Answer

I found this in the Cheetah source code (here)

htmlCodes = [
    ['&', '&amp;'],
    ['<', '&lt;'],
    ['>', '&gt;'],
    ['"', '&quot;'],
]
htmlCodesReversed = htmlCodes[:]
htmlCodesReversed.reverse()
def htmlDecode(s, codes=htmlCodesReversed):
    """ Returns the ASCII decoded version of the given HTML string. This does
        NOT remove normal HTML tags like <p>. It is the inverse of htmlEncode()."""
    for code in codes:
        s = s.replace(code[1], code[0])
    return s

not sure why they reverse the list, I think it has to do with the way they encode, so with you it may not need to be reversed. Also if I were you I would change htmlCodes to be a list of tuples rather than a list of lists... this is going in my library though :)

i noticed your title asked for encode too, so here is Cheetah's encode function.

def htmlEncode(s, codes=htmlCodes):
    """ Returns the HTML encoded version of the given string. This is useful to
        display a plain ASCII text string on a web page."""
    for code in codes:
        s = s.replace(code[0], code[1])
    return s

User · Answer

In Python 3 4    import html  html unescape your string

User · Answer

Even though this is a really old question  this may work   Django 1 5 5  In  1   from django utils text import unescape entities In  2   unescape entities   amp lt img class  amp quot size-medium wp-image-113 amp quot  style  amp quot margin-left  15px  amp quot  title  amp quot su1 amp quot  src  amp quot http   blah org wp-content uploads 2008 10 su1-300x194 jpg amp quot  alt  amp quot  amp quot  width  amp quot 300 amp quot  height  amp quot 194 amp quot    amp gt    Out 2   u  lt img class  size-medium wp-image-113  style  margin-left  15px   title  su1  src  http   blah org wp-content uploads 2008 10 su1-300x194 jpg  alt    width  300  height  194    gt

User · Answer

This is the easiest solution for this problem -      autoescape on          body       endautoescape      From this page

User · Answer

See at the bottom of this page at Python wiki  there are at least 2 options to  unescape  html

User · Answer

Given the Django use case  there are two answers to this   Here is its django utils html escape function  for reference   def escape html          Returns the given HTML with ampersands  quotes and carets encoded         return mark safe force unicode html  replace   amp      amp amp    replace   lt      amp l t    replace   gt      amp gt    replace        amp quot    replace        amp  39       To reverse this  the Cheetah function described in Jake s answer should work  but is missing the single-quote   This version includes an updated tuple  with the order of replacement reversed to avoid symmetric problems   def html decode s               Returns the ASCII decoded version of the given HTML string  This does     NOT remove normal HTML tags like  lt p gt               htmlCodes                         amp  39                         amp quot                    gt      amp gt                    lt      amp lt                    amp      amp amp                  for code in htmlCodes          s   s replace code 1   code 0       return s  unescaped   html decode my string    This  however  is not a general solution  it is only appropriate for strings encoded with django utils html escape   More generally  it is a good idea to stick with the standard library     Python 2 x  import HTMLParser html parser   HTMLParser HTMLParser   unescaped   html parser unescape my string     Python 3 x  import html parser html parser   html parser HTMLParser   unescaped   html parser unescape my string      gt   Python 3 5  from html import unescape unescaped   unescape my string    As a suggestion  it may make more sense to store the HTML unescaped in your database   It d be worth looking into getting unescaped results back from BeautifulSoup if possible  and avoiding this process altogether   With Django  escaping only occurs during template rendering  so to prevent escaping you just tell the templating engine not to escape your string   To do that  use one of these options in your template      context var safe       autoescape off           context var       endautoescape

User · Answer

Use daniel s solution if the set of encoded characters is relatively restricted  Otherwise  use one of the numerous HTML-parsing libraries   I like BeautifulSoup because it can handle malformed XML HTML    http   www crummy com software BeautifulSoup   for your question  there s an example in their documentation   from BeautifulSoup import BeautifulStoneSoup BeautifulStoneSoup  Sacr amp eacute  bl amp  101 u                        convertEntities BeautifulStoneSoup HTML ENTITIES  contents 0    u Sacr xe9 bleu

User · Answer

Given the Django use case  there are two answers to this   Here is its django utils html escape function  for reference   def escape html          Returns the given HTML with ampersands  quotes and carets encoded         return mark safe force unicode html  replace   amp      amp amp    replace   lt      amp l t    replace   gt      amp gt    replace        amp quot    replace        amp  39       To reverse this  the Cheetah function described in Jake s answer should work  but is missing the single-quote   This version includes an updated tuple  with the order of replacement reversed to avoid symmetric problems   def html decode s               Returns the ASCII decoded version of the given HTML string  This does     NOT remove normal HTML tags like  lt p gt               htmlCodes                         amp  39                         amp quot                    gt      amp gt                    lt      amp lt                    amp      amp amp                  for code in htmlCodes          s   s replace code 1   code 0       return s  unescaped   html decode my string    This  however  is not a general solution  it is only appropriate for strings encoded with django utils html escape   More generally  it is a good idea to stick with the standard library     Python 2 x  import HTMLParser html parser   HTMLParser HTMLParser   unescaped   html parser unescape my string     Python 3 x  import html parser html parser   html parser HTMLParser   unescaped   html parser unescape my string      gt   Python 3 5  from html import unescape unescaped   unescape my string    As a suggestion  it may make more sense to store the HTML unescaped in your database   It d be worth looking into getting unescaped results back from BeautifulSoup if possible  and avoiding this process altogether   With Django  escaping only occurs during template rendering  so to prevent escaping you just tell the templating engine not to escape your string   To do that  use one of these options in your template      context var safe       autoescape off           context var       endautoescape

User · Answer

I found a fine function at  http   snippets dzone com posts show 4569  def decodeHtmlentities string       import re     entity re   re compile   amp       d 1 5   w 1 8           def substitute entity match           from htmlentitydefs import name2codepoint as n2cp         ent   match group 2          if match group 1                      return unichr int ent           else              cp   n2cp get ent               if cp                  return unichr cp              else                  return match group        return entity re subn substitute entity  string  0

User · Answer

For html encoding  there s cgi escape from the standard library    gt  gt  help cgi escape  cgi escape   escape s  quote None      Replace special characters   amp      lt   and   gt   to HTML-safe sequences      If the optional flag quote is true  the quotation mark character         is also translated    For html decoding  I use the following   import re from htmlentitydefs import name2codepoint   for some reason  python 2 5 2 doesn t have this one  apostrophe  name2codepoint   39     39  def unescape s        unescape HTML code refs  c f  http   wiki python org moin EscapingHtml      return re sub   amp   s          join name2codepoint                 lambda m  unichr name2codepoint m group 1     s    For anything more complicated  I use BeautifulSoup

User · Answer

I found this in the Cheetah source code (here)

htmlCodes = [
    ['&', '&amp;'],
    ['<', '&lt;'],
    ['>', '&gt;'],
    ['"', '&quot;'],
]
htmlCodesReversed = htmlCodes[:]
htmlCodesReversed.reverse()
def htmlDecode(s, codes=htmlCodesReversed):
    """ Returns the ASCII decoded version of the given HTML string. This does
        NOT remove normal HTML tags like <p>. It is the inverse of htmlEncode()."""
    for code in codes:
        s = s.replace(code[1], code[0])
    return s

not sure why they reverse the list, I think it has to do with the way they encode, so with you it may not need to be reversed. Also if I were you I would change htmlCodes to be a list of tuples rather than a list of lists... this is going in my library though :)

i noticed your title asked for encode too, so here is Cheetah's encode function.

def htmlEncode(s, codes=htmlCodes):
    """ Returns the HTML encoded version of the given string. This is useful to
        display a plain ASCII text string on a web page."""
    for code in codes:
        s = s.replace(code[0], code[1])
    return s

User · Answer

See at the bottom of this page at Python wiki  there are at least 2 options to  unescape  html

User · Answer

Use daniel s solution if the set of encoded characters is relatively restricted  Otherwise  use one of the numerous HTML-parsing libraries   I like BeautifulSoup because it can handle malformed XML HTML    http   www crummy com software BeautifulSoup   for your question  there s an example in their documentation   from BeautifulSoup import BeautifulStoneSoup BeautifulStoneSoup  Sacr amp eacute  bl amp  101 u                        convertEntities BeautifulStoneSoup HTML ENTITIES  contents 0    u Sacr xe9 bleu

User · Answer

See at the bottom of this page at Python wiki  there are at least 2 options to  unescape  html

User · Answer

If anyone is looking for a simple way to do this via the django templates  you can always use filters like this   lt html gt     node description safe     lt  html gt   I had some data coming from a vendor and everything I posted had html tags actually written on the rendered page as if you were looking at the source

User · Answer

For html encoding  there s cgi escape from the standard library    gt  gt  help cgi escape  cgi escape   escape s  quote None      Replace special characters   amp      lt   and   gt   to HTML-safe sequences      If the optional flag quote is true  the quotation mark character         is also translated    For html decoding  I use the following   import re from htmlentitydefs import name2codepoint   for some reason  python 2 5 2 doesn t have this one  apostrophe  name2codepoint   39     39  def unescape s        unescape HTML code refs  c f  http   wiki python org moin EscapingHtml      return re sub   amp   s          join name2codepoint                 lambda m  unichr name2codepoint m group 1     s    For anything more complicated  I use BeautifulSoup

User · Answer

You can also use django utils html escape  from django utils html import escape  something nice   escape request POST  something naughty

User · Answer

This is the easiest solution for this problem -      autoescape on          body       endautoescape      From this page

[python] How do I perform HTML decoding/encoding using Python/Django?

Examples related to python

Examples related to django

Examples related to html-encode