Pretty printing XML in Python

Question

What is the best way  or are the various ways  to pretty print XML in Python

User · Accepted Answer

import xml dom minidom  dom   xml dom minidom parse xml fname    or xml dom minidom parseString xml string  pretty xml as string   dom toprettyxml

User · Answer

An alternative if you don t want to have to reparse  there is the xmlpp py library with the get pprint   function  It worked nice and smoothly for my use cases  without having to reparse to an lxml ElementTree object

User · Answer

As others pointed out  lxml has a pretty printer built in   Be aware though that by default it changes CDATA sections to normal text  which can have nasty results   Here s a Python function that preserves the input file and only changes the indentation  notice the strip cdata False   Furthermore it makes sure the output uses UTF-8 as encoding instead of the default ASCII  notice the encoding  utf-8     from lxml import etree  def prettyPrintXml xmlFilePathToPrettyPrint       assert xmlFilePathToPrettyPrint is not None     parser   etree XMLParser resolve entities False  strip cdata False      document   etree parse xmlFilePathToPrettyPrint  parser      document write xmlFilePathToPrettyPrint  pretty print True  encoding  utf-8     Example usage   prettyPrintXml  some folder some file xml

User · Answer

Use etree indent and etree tostring import lxml etree as etree  root   etree fromstring   lt html gt  lt head gt  lt  head gt  lt body gt  lt h1 gt Welcome lt  h1 gt  lt  body gt  lt  html gt    etree indent root  space  quot    quot   xml string   etree tostring root  pretty print True  decode   print xml string   output  lt html gt     lt head  gt     lt body gt       lt h1 gt Welcome lt  h1 gt     lt  body gt   lt  html gt    Removing namespaces and prefixes import lxml etree as etree   def dump xml element       for item in element getiterator            item tag   etree QName item  localname      etree cleanup namespaces element      etree indent element  space  quot    quot       result   etree tostring element  pretty print True  decode       return result   root   etree fromstring   lt cs document xmlns cs  quot http   blabla com quot  gt  lt name gt hello world lt  name gt  lt  cs document gt    xml string   dump xml root  print xml string   output  lt document gt     lt name gt hello world lt  name gt   lt  document gt

User · Answer

I found this question while looking for  quot how to pretty print html quot  Using some of the ideas in this thread I adapted the XML solutions to work for XML or HTML  from xml dom minidom import parseString as string to dom  def prettify string  html True       dom   string to dom string      ugly   dom toprettyxml indent  quot    quot       split   list filter lambda x  len x strip     ugly split   n         if html          split   split 1       pretty     n  join split      return pretty  def pretty print html       print prettify html    When used this is what it looks like  html    quot  quot  quot    lt div class  quot foo quot  id  quot bar quot  gt  lt p gt  IDK   lt  p gt  lt br  gt  lt div class  baz  gt  lt div gt   lt span gt Hi lt  span gt  lt  div gt  lt  div gt  lt p id  blarg  gt Try for 2 lt  p gt   lt div class  baz  gt Oh No  lt  div gt  lt  div gt   quot  quot  quot   pretty print html   Which returns   lt div class  quot foo quot  id  quot bar quot  gt     lt p gt  IDK   lt  p gt     lt br  gt     lt div class  quot baz quot  gt       lt div gt         lt span gt Hi lt  span gt       lt  div gt     lt  div gt     lt p id  quot blarg quot  gt Try for 2 lt  p gt     lt div class  quot baz quot  gt Oh No  lt  div gt   lt  div gt

User · Answer

Take a look at the vkbeautify module   It is a python version of my very popular javascript nodejs plugin with the same name  It can pretty-print minify XML  JSON and CSS text  Input and output can be string file in any combinations  It is very compact and doesn t have any dependency   Examples   import vkbeautify as vkb  vkb xml text                         vkb xml text   path to dest file     vkb xml  path to src file           vkb xml  path to src file    path to dest file

User · Answer

from yattag import indent  pretty string   indent ugly string    It won t add spaces or newlines inside text nodes  unless you ask for it with   indent mystring  indent text   True    You can specify what the indentation unit should be and what the newline should look like   pretty xml string   indent      ugly xml string      indentation               newline     r n      The doc is on http   www yattag org homepage

User · Answer

I had some problems with minidom s pretty print   I d get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding  eg if I had a    in a document and I tried doc toprettyxml encoding  latin-1     Here s my workaround for it   def toprettyxml doc  encoding          Return a pretty-printed XML document in a given encoding         unistr   doc toprettyxml   replace u  lt  xml version  1 0    gt                              u  lt  xml version  1 0  encoding   s   gt     encoding      return unistr encode encoding   xmlcharrefreplace

User · Answer

lxml is recent  updated  and includes a pretty print function  import lxml etree as etree  x   etree parse  filename   print etree tostring x  pretty print True    Check out the lxml tutorial  http   lxml de tutorial html

User · Answer

As of Python 3 9  still a release candidate as of 12 Aug 2020   there is a new xml etree ElementTree indent   function for pretty-printing XML trees  Sample usage  import xml etree ElementTree as ET  element   ET XML  quot  lt html gt  lt body gt text lt  body gt  lt  html gt  quot   ET indent element  print ET tostring element  encoding  unicode     The upside is that it does not require any additional libraries  For more information check https   bugs python org issue14465 and https   github com python cpython pull 15200

User · Answer

If you re using a DOM implementation  each has their own form of pretty-printing built-in     minidom   document toprettyxml      4DOM   xml dom ext PrettyPrint document  stream     pxdom  or other DOM Level 3 LS-compliant imp    serializer domConfig setParameter  format-pretty-print   True  serializer writeToString document    If you re using something else without its own pretty-printer     or those pretty-printers don t quite do it the way you want      you d probably have to write or subclass your own serialiser

User · Answer

from lxml import etree import xml dom minidom as mmd  xml root   etree parse xml fiel path  etree XMLParser     def print xml xml root       plain xml   etree tostring xml root  decode  utf-8       urgly xml      join plain xml  split        good xml   mmd parseString urgly xml      print good xml toprettyxml indent             It s working well for the xml with Chinese

User · Answer

BeautifulSoup has a easy to use prettify   method    It indents one space per indentation level  It works much better than lxml s pretty print and is short and sweet    from bs4 import BeautifulSoup  bs   BeautifulSoup open xml file    xml   print bs prettify

User · Answer

If you have xmllint you can spawn a subprocess and use it  xmllint --format  lt file gt  pretty-prints its input XML to standard output   Note that this method uses an program external to python  which makes it sort of a hack   def pretty print xml xml       proc   subprocess Popen            xmllint    --format     dev stdin            stdin subprocess PIPE          stdout subprocess PIPE             output  error output    proc communicate xml       return output  print pretty print xml data

User · Answer

I had this problem and solved it like this   def write xml file  self  file  xml root element  xml declaration False  pretty print False  encoding  unicode   indent   t        pretty printed xml   etree tostring xml root element  xml declaration xml declaration  pretty print pretty print  encoding encoding      if pretty print  pretty printed xml   pretty printed xml replace       indent      file write pretty printed xml    In my code this method is called like this   try      with open file path   w   as file          file write   lt  xml version  1 0  encoding  utf-8    gt               create some xml content using etree              xml parser   XMLParser           xml parser write xml file file  xml root  xml declaration False  pretty print True  encoding  unicode   indent   t    except IOError      print  Error while writing in log file      This works only because etree by default uses two spaces to indent  which I don t find very much emphasizing the indentation and therefore not pretty  I couldn t ind any setting for etree or parameter for any function to change the standard etree indent  I like how easy it is to use etree  but this was really annoying me

User · Answer

XML pretty print for python looks pretty good for this task    Appropriately named  too    An alternative is to use pyXML  which has a PrettyPrint function

User · Answer

You can use popular external library xmltodict  with unparse and pretty True you will get best result   xmltodict unparse      xmltodict parse my xml   full document False  pretty True    full document False against  lt  xml version  1 0  encoding  UTF-8   gt  at the top

User · Answer

You can try this variation     Install BeautifulSoup and the backend lxml  parser  libraries   user  pip3 install lxml bs4   Process your XML document   from bs4 import BeautifulSoup  with open   path to file xml    r   as doc       for line in doc           print BeautifulSoup line   lxml-xml   prettify

User · Answer

If for some reason you can t get your hands on any of the Python modules that other users mentioned  I suggest the following solution for Python 2 7   import subprocess  def makePretty filepath     cmd    xmllint --format     filepath   prettyXML   subprocess check output cmd  shell   True    with open filepath   w   as outfile      outfile write prettyXML    As far as I know  this solution will work on Unix-based systems that have the xmllint package installed

User · Answer

For converting an entire xml document to a pretty xml document  ex  assuming you ve extracted  unzipped  a LibreOffice Writer  odt or  ods file  and you want to convert the ugly  content xml  file to a pretty one for automated git version control and git difftooling of  odt  ods files  such as I m implementing here   import xml dom minidom  file   open    content xml    r   xml string   file read   file close    parsed xml   xml dom minidom parseString xml string  pretty xml as string   parsed xml toprettyxml    file   open    content new xml    w   file write pretty xml as string  file close     References  - Thanks to Ben Noland s answer on this page which got me most of the way there

User · Answer

Another solution is to borrow this indent function  for use with the ElementTree library that s built in to Python since 2 5  Here s what that would look like   from xml etree import ElementTree  def indent elem  level 0       i     n    level          j     n     level-1           if len elem           if not elem text or not elem text strip                elem text   i                if not elem tail or not elem tail strip                elem tail   i         for subelem in elem              indent subelem  level 1          if not elem tail or not elem tail strip                elem tail   j     else          if level and  not elem tail or not elem tail strip                 elem tail   j     return elem          root   ElementTree parse   tmp xmlfile   getroot   indent root  ElementTree dump root

User · Answer

I solved this with some lines of code  opening the file  going trough it and adding indentation  then saving it again  I was working with small xml files  and did not want to add dependencies  or more libraries to install for the user  Anyway  here is what I ended up with       f   open file name  r       xml   f read       f close         Removing old indendations     raw xml                  for line in xml          raw xml    line      xml   raw xml      new xml          indent              deepness   0      for i in range  len xml              new xml    xml i             if i lt len xml -3                simpleSplit   xml i  i 2        gt  lt               advancSplit   xml i  i 3        gt  lt                        end   xml i  i 2         gt                   start   xml i       lt                if advancSplit                   deepness    -1                 new xml      n    indent deepness                 simpleSplit   False                 deepness    -1             if simpleSplit                   new xml      n    indent deepness             if start                   deepness    1             if end                   deepness    -1      f   open file name  w       f write new xml      f close     It works for me  perhaps someone will have some use of it

User · Answer

Here s my  hacky   solution to get around the ugly text node problem   uglyXml   doc toprettyxml indent        text re   re compile   gt  n s     lt  gt  s      n s  lt     re DOTALL      prettyXml   text re sub   gt  g lt 1 gt  lt     uglyXml   print prettyXml   The above code will produce    lt  xml version  1 0    gt   lt issues gt     lt issue gt       lt id gt 1 lt  id gt       lt title gt Add Visual Studio 2005 and 2008 solution files lt  title gt       lt details gt We need Visual Studio 2005 2008 project files for Windows  lt  details gt     lt  issue gt   lt  issues gt    Instead of this    lt  xml version  1 0    gt   lt issues gt     lt issue gt       lt id gt        1      lt  id gt       lt title gt        Add Visual Studio 2005 and 2008 solution files      lt  title gt       lt details gt        We need Visual Studio 2005 2008 project files for Windows       lt  details gt     lt  issue gt   lt  issues gt    Disclaimer  There are probably some limitations

User · Answer

I wrote a solution to walk through an existing ElementTree and use text tail to indent it as one typically expects   def prettify element  indent            queue     0  element       level  element      while queue          level  element   queue pop 0          children     level   1  child  for child in list element           if children              element text     n    indent    level 1     for child open         if queue              element tail     n    indent   queue 0  0     for sibling open         else              element tail     n    indent    level-1     for parent close         queue 0 0    children    prepend so children come before siblings

User · Answer

I tried to edit  ade s answer above  but Stack Overflow wouldn t let me edit after I had initially provided feedback anonymously   This is a less buggy version of the function to pretty-print an ElementTree   def indent elem  level 0  more sibs False       i     n      if level          i     level-1             num kids   len elem      if num kids          if not elem text or not elem text strip                elem text   i                    if level                  elem text                 count   0         for kid in elem              indent kid  level 1  count  lt  num kids - 1              count    1         if not elem tail or not elem tail strip                elem tail   i             if more sibs                  elem tail             else          if level and  not elem tail or not elem tail strip                 elem tail   i             if more sibs                  elem tail

User · Answer

Here s a Python3 solution that gets rid of the ugly newline issue  tons of whitespace   and it only uses standard libraries unlike most other implementations   import xml etree ElementTree as ET import xml dom minidom import os  def pretty print xml given root root  output xml               Useful for when you are editing xml data on the fly             xml string   xml dom minidom parseString ET tostring root   toprettyxml       xml string   os linesep join  s for s in xml string splitlines   if s strip       remove the weird newline issue     with open output xml   w   as file out          file out write xml string   def pretty print xml given file input xml  output xml               Useful for when you want to reformat an already existing xml file             tree   ET parse input xml      root   tree getroot       pretty print xml given root root  output xml    I found how to fix the common newline issue here

[python] Pretty printing XML in Python

Examples related to python

Examples related to xml

Examples related to pretty-print