Converting XML to JSON using Python

Question

I ve seen a fair share of ungainly XML- JSON code on the web  and having interacted with Stack s users for a bit  I m convinced that this crowd can help more than the first few pages of Google results can   So  we re parsing a weather feed  and we need to populate weather widgets on a multitude of web sites   We re looking now into Python-based solutions   This public weather com RSS feed is a good example of what we d be parsing  our actual weather com feed contains additional information because of a partnership w them    In a nutshell  how should we convert XML to JSON using Python

User · Accepted Answer

There is no  one-to-one  mapping between XML and JSON  so converting one to the other necessarily requires some understanding of what you want to do with the results   That being said  Python s standard library has several modules for parsing XML  including DOM  SAX  and ElementTree    As of Python 2 6  support for converting Python data structures to and from JSON is included in the json module   So the infrastructure is there

User · Answer

jsonpickle or if you re using feedparser  you can try feed parser to json py

User · Answer

xmltodict  full disclosure  I wrote it  can help you convert your XML to a dict list string structure  following this  standard   It is Expat-based  so it s very fast and doesn t need to load the whole XML tree in memory   Once you have that data structure  you can serialize it to JSON   import xmltodict  json  o   xmltodict parse   lt e gt   lt a gt text lt  a gt   lt a gt text lt  a gt   lt  e gt    json dumps o       e     a     text    text

User · Answer

You can use the xmljson library to convert using different XML JSON conventions   For example  this XML    lt p id  1  gt text lt  p gt    translates via the BadgerFish convention into this        p           id   1            text          and via the GData convention into this  attributes are not supported         p           t    text              and via the Parker convention into this  attributes are not supported         p    text      It s possible to convert from XML to JSON and from JSON to XML using the same conventions    gt  gt  gt  import json  xmljson  gt  gt  gt  from lxml etree import fromstring  tostring  gt  gt  gt  xml   fromstring   lt p id  1  gt text lt  p gt     gt  gt  gt  json dumps xmljson badgerfish data xml      p      id   1        text      gt  gt  gt  xmljson parker etree   ul     li    1  2       Creates   lt ul gt  lt li gt 1 lt  li gt  lt li gt 2 lt  li gt  lt  ul gt     Disclosure  I wrote this library  Hope it helps future searchers

User · Answer

I found for simple XML snips  use regular expression would save troubles   For example      lt user gt  lt name gt Happy Man lt  name gt     lt  user gt  import re names   re findall r  lt name gt   w   lt   name gt    xml string    do some thing to names   To do it by XML parsing  as  Dan said  there is not one-for-all solution because the data is different  My suggestion is to use lxml  Although not finished to json  lxml objectify give quiet good results    gt  gt  gt  from lxml import objectify  gt  gt  gt  root   objectify fromstring          lt root xmlns xsi  http   www w3 org 2001 XMLSchema-instance  gt         lt a attr1  foo  attr2  bar  gt 1 lt  a gt         lt a gt 1 2 lt  a gt         lt b gt 1 lt  b gt         lt b gt true lt  b gt         lt c gt what  lt  c gt         lt d xsi nil  true   gt       lt  root gt             gt  gt  gt  print str root   root   None  ObjectifiedElement      a   1  IntElement          attr1    foo          attr2    bar      a   1 2  FloatElement      b   1  IntElement      b   True  BoolElement      c    what    StringElement      d   None  NoneElement          xsi nil    true

User · Answer

While the built-in libs for XML parsing are quite good I am partial to lxml   But for parsing RSS feeds  I d recommend Universal Feed Parser  which can also parse Atom  Its main advantage is that it can digest even most malformed feeds   Python 2 6 already includes a JSON parser  but a newer version with improved speed is available as simplejson   With these tools building your app shouldn t be that difficult

User · Answer

Well  probably the simplest way is just parse the XML into dictionaries and then serialize that with simplejson

User · Answer

If some time you get only response code instead of all data then error like json parse will be there so u need to convert it as text  import xmltodict  data   requests get url  xpars   xmltodict parse data text  json   json dumps xpars  print json

User · Answer

Here s the code I built for that  There s no parsing of the contents  just plain conversion   from xml dom import minidom import simplejson as json def parse element element       dict data   dict       if element nodeType    element TEXT NODE          dict data  data     element data     if element nodeType not in  element TEXT NODE  element DOCUMENT NODE                                   element DOCUMENT TYPE NODE           for item in element attributes items                dict data item 0     item 1      if element nodeType not in  element TEXT NODE  element DOCUMENT TYPE NODE           for child in element childNodes              child name  child dict   parse element child              if child name in dict data                  try                      dict data child name  append child dict                  except AttributeError                      dict data child name     dict data child name   child dict              else                  dict data child name    child dict      return element nodeName  dict data  if   name         main         dom   minidom parse  data xml       f   open  data json    w       f write json dumps parse element dom   sort keys True  indent 4       f close

User · Answer

You can use declxml  It has advanced features like multi attributes and complex nested support  You just need to write a simple processor for it  Also with the same code  you can convert back to JSON as well  It is fairly straightforward and the documentation is awesome   Link  https   declxml readthedocs io en latest index html

User · Answer

This stuff here is actively maintained and so far is my favorite  xml2json in python

User · Answer

If you don t want to use any external libraries and 3rd party tools  Try below code  Code import re import json  def getdict content       res re findall  quot  lt   P lt var gt  S    P lt attr gt     gt           gt   P lt val gt      lt    P var  gt        gt    quot  content      if len res  gt  1          attreg  quot   P lt avr gt  S            P lt quote gt     quot     P lt avl gt       P quote         P lt avl1 gt         s       P lt avl2 gt   s       quot          if len res  gt 1              return   i 0     quot  attributes quot    j 0   j 2  or j 3  or j 4    for j in re findall attreg i 1  strip        quot  values quot  getdict i 2      for i in res          else              return  res 0     quot  attributes quot    j 0   j 2  or j 3  or j 4    for j in re findall attreg res 1  strip        quot  values quot  getdict res 2          else          return content  with open  quot test xml quot   quot r quot   as f      print json dumps getdict f read   replace   n          Sample Input  lt details class  quot 4b quot  count 1 boy gt       lt name type  quot firstname quot  gt John lt  name gt       lt age gt 13 lt  age gt       lt hobby gt Coin collection lt  hobby gt       lt hobby gt Stamp collection lt  hobby gt       lt address gt           lt country gt USA lt  country gt           lt state gt CA lt  state gt       lt  address gt   lt  details gt   lt details empty  quot True quot   gt   lt details  gt   lt details class  quot 4a quot  count 2 girl gt       lt name type  quot firstname quot  gt Samantha lt  name gt       lt age gt 13 lt  age gt       lt hobby gt Fishing lt  hobby gt       lt hobby gt Chess lt  hobby gt       lt address current  quot no quot  gt           lt country gt Australia lt  country gt           lt state gt NSW lt  state gt       lt  address gt   lt  details gt   Output            quot details quot                      quot  attributes quot                              quot class quot    quot 4b quot                                        quot count quot    quot 1 quot                                        quot boy quot    quot  quot                                                  quot  values quot                              quot name quot                                      quot  attributes quot                                              quot type quot    quot firstname quot                                                                                          quot  values quot    quot John quot                                                                      quot age quot                                      quot  attributes quot                                                        quot  values quot    quot 13 quot                                                                      quot hobby quot                                      quot  attributes quot                                                        quot  values quot    quot Coin collection quot                                                                      quot hobby quot                                      quot  attributes quot                                                        quot  values quot    quot Stamp collection quot                                                                      quot address quot                                      quot  attributes quot                                                        quot  values quot                                              quot country quot                                                      quot  attributes quot                                                                                quot  values quot    quot USA quot                                                                                                              quot state quot                                                      quot  attributes quot                                                                                quot  values quot    quot CA quot                                                                                                                                                                      quot details quot                      quot  attributes quot                              quot empty quot    quot True quot                                                  quot  values quot    quot  quot                              quot details quot                      quot  attributes quot                                quot  values quot    quot  quot                              quot details quot                      quot  attributes quot                              quot class quot    quot 4a quot                                        quot count quot    quot 2 quot                                        quot girl quot    quot  quot                                                  quot  values quot                              quot name quot                                      quot  attributes quot                                              quot type quot    quot firstname quot                                                                                          quot  values quot    quot Samantha quot                                                                      quot age quot                                      quot  attributes quot                                                        quot  values quot    quot 13 quot                                                                      quot hobby quot                                      quot  attributes quot                                                        quot  values quot    quot Fishing quot                                                                      quot hobby quot                                      quot  attributes quot                                                        quot  values quot    quot Chess quot                                                                      quot address quot                                      quot  attributes quot                                              quot current quot    quot no quot                                                                                          quot  values quot                                              quot country quot                                                      quot  attributes quot                                                                                quot  values quot    quot Australia quot                                                                                                              quot state quot                                                      quot  attributes quot                                                                                quot  values quot    quot NSW quot

User · Answer

You can use the xmljson library to convert using different XML JSON conventions   For example  this XML    lt p id  1  gt text lt  p gt    translates via the BadgerFish convention into this        p           id   1            text          and via the GData convention into this  attributes are not supported         p           t    text              and via the Parker convention into this  attributes are not supported         p    text      It s possible to convert from XML to JSON and from JSON to XML using the same conventions    gt  gt  gt  import json  xmljson  gt  gt  gt  from lxml etree import fromstring  tostring  gt  gt  gt  xml   fromstring   lt p id  1  gt text lt  p gt     gt  gt  gt  json dumps xmljson badgerfish data xml      p      id   1        text      gt  gt  gt  xmljson parker etree   ul     li    1  2       Creates   lt ul gt  lt li gt 1 lt  li gt  lt li gt 2 lt  li gt  lt  ul gt     Disclosure  I wrote this library  Hope it helps future searchers

User · Answer

xmltodict  full disclosure  I wrote it  can help you convert your XML to a dict list string structure  following this  standard   It is Expat-based  so it s very fast and doesn t need to load the whole XML tree in memory   Once you have that data structure  you can serialize it to JSON   import xmltodict  json  o   xmltodict parse   lt e gt   lt a gt text lt  a gt   lt a gt text lt  a gt   lt  e gt    json dumps o       e     a     text    text

User · Answer

I d suggest not going for a direct conversion  Convert XML to an object  then from the object to JSON   In my opinion  this gives a cleaner definition of how the XML and JSON correspond   It takes time to get right and you may even write tools to help you with generating some of it  but it would look roughly like this   class Channel    def   init   self      self items          self title         def from xml  self  xml node        self title   xml node xpath  title text     0      for x in xml node xpath  item          item   Item         item from xml  x         self items append  item      def to json  self        retval          retval  title     title     retval  items            for x in items        retval append  x to json         return retval  class Item    def   init   self              def from xml  self  xml node               def to json  self

User · Answer

While the built-in libs for XML parsing are quite good I am partial to lxml   But for parsing RSS feeds  I d recommend Universal Feed Parser  which can also parse Atom  Its main advantage is that it can digest even most malformed feeds   Python 2 6 already includes a JSON parser  but a newer version with improved speed is available as simplejson   With these tools building your app shouldn t be that difficult

User · Answer

check out lxml2json  disclosure  I wrote it    https   github com rparelius lxml2json  it s very fast  lightweight  only requires lxml   and one advantage is that you have control over whether certain elements are converted to lists or dicts

User · Answer

To anyone that may still need this  Here s a newer  simple code to do this conversion   from xml etree import ElementTree as ET  xml      ET parse  FILE NAME xml   parsed   parseXmlToJson xml    def parseXmlToJson xml     response         for child in list xml       if len list child    gt  0        response child tag    parseXmlToJson child      else        response child tag    child text or           one-liner equivalent       response child tag    parseXmlToJson child  if len list child    gt  0 else child text or       return response

User · Answer

When I do anything with XML in python I almost always use the lxml package   I suspect that most people use lxml   You could use xmltodict but you will have to pay the penalty of parsing the XML again   To convert XML to json with lxml you    Parse XML document with lxml Convert lxml to a dict Convert list to json   I use the following class in my projects   Use the toJson method   from lxml import etree  import json   class Element              Wrapper on the etree Element class   Extends functionality to output element     as a dictionary               def   init   self  element                        param  element a normal etree Element instance                     self element   element      def toDict self                       Returns the element as a dictionary   This includes all child elements                      rval                 self element tag                     attributes   dict self element items                                      for child in self element              rval self element tag  update Element child  toDict            return rval   class XmlDocument              Wraps lxml to provide          - cleaner access to some common lxml etree functions         - converter from XML to dict         - converter from XML to json             def   init   self  xml     lt empty  gt    filename None                       There are two ways to initialize the XmlDocument contents              - String             - File          You don t have to initialize the XmlDocument during instantiation         though   You can do it later with the  set  method   If you choose to         initialize later XmlDocument will be initialized with   lt empty  gt              param  xml Set this argument if you want to parse from a string           param  filename Set this argument if you want to parse from a file                      self set xml  filename        def set self  xml None  filename None                       Use this to set or reset the contents of the XmlDocument            param  xml Set this argument if you want to parse from a string           param  filename Set this argument if you want to parse from a file                      if filename is not None              self tree   etree parse filename              self root   self tree getroot           else              self root   etree fromstring xml              self tree   etree ElementTree self root        def dump self           etree dump self root       def getXml self                       return document as a string                     return etree tostring self root       def xpath self  xpath                       Return elements that match the given xpath            param  xpath                     return self tree xpath xpath        def nodes self                       Return all elements                     return self root iter           def toDict self                       Convert to a python dictionary                     return Element self root  toDict        def toJson self  indent None                       Convert to JSON                     return json dumps self toDict    indent indent    if   name         main         xml     lt system gt       lt product gt           lt demod gt               lt frequency value  2 215  units  MHz  gt                   lt blah value  1   gt               lt  frequency gt           lt  demod gt       lt  product gt   lt  system gt          doc   XmlDocument xml      print doc toJson indent 4    The output from the built in main is          system              attributes                 product                  attributes                     demod                      attributes                         frequency                          attributes                              units    MHz                             value    2 215                                               blah                              attributes                                  value    1                                                                                                      Which is a transformation of this xml    lt system gt       lt product gt           lt demod gt               lt frequency value  2 215  units  MHz  gt                   lt blah value  1   gt               lt  frequency gt           lt  demod gt       lt  product gt   lt  system gt

User · Answer

If you don t want to use any external libraries and 3rd party tools  Try below code  Code import re import json  def getdict content       res re findall  quot  lt   P lt var gt  S    P lt attr gt     gt           gt   P lt val gt      lt    P var  gt        gt    quot  content      if len res  gt  1          attreg  quot   P lt avr gt  S            P lt quote gt     quot     P lt avl gt       P quote         P lt avl1 gt         s       P lt avl2 gt   s       quot          if len res  gt 1              return   i 0     quot  attributes quot    j 0   j 2  or j 3  or j 4    for j in re findall attreg i 1  strip        quot  values quot  getdict i 2      for i in res          else              return  res 0     quot  attributes quot    j 0   j 2  or j 3  or j 4    for j in re findall attreg res 1  strip        quot  values quot  getdict res 2          else          return content  with open  quot test xml quot   quot r quot   as f      print json dumps getdict f read   replace   n          Sample Input  lt details class  quot 4b quot  count 1 boy gt       lt name type  quot firstname quot  gt John lt  name gt       lt age gt 13 lt  age gt       lt hobby gt Coin collection lt  hobby gt       lt hobby gt Stamp collection lt  hobby gt       lt address gt           lt country gt USA lt  country gt           lt state gt CA lt  state gt       lt  address gt   lt  details gt   lt details empty  quot True quot   gt   lt details  gt   lt details class  quot 4a quot  count 2 girl gt       lt name type  quot firstname quot  gt Samantha lt  name gt       lt age gt 13 lt  age gt       lt hobby gt Fishing lt  hobby gt       lt hobby gt Chess lt  hobby gt       lt address current  quot no quot  gt           lt country gt Australia lt  country gt           lt state gt NSW lt  state gt       lt  address gt   lt  details gt   Output            quot details quot                      quot  attributes quot                              quot class quot    quot 4b quot                                        quot count quot    quot 1 quot                                        quot boy quot    quot  quot                                                  quot  values quot                              quot name quot                                      quot  attributes quot                                              quot type quot    quot firstname quot                                                                                          quot  values quot    quot John quot                                                                      quot age quot                                      quot  attributes quot                                                        quot  values quot    quot 13 quot                                                                      quot hobby quot                                      quot  attributes quot                                                        quot  values quot    quot Coin collection quot                                                                      quot hobby quot                                      quot  attributes quot                                                        quot  values quot    quot Stamp collection quot                                                                      quot address quot                                      quot  attributes quot                                                        quot  values quot                                              quot country quot                                                      quot  attributes quot                                                                                quot  values quot    quot USA quot                                                                                                              quot state quot                                                      quot  attributes quot                                                                                quot  values quot    quot CA quot                                                                                                                                                                      quot details quot                      quot  attributes quot                              quot empty quot    quot True quot                                                  quot  values quot    quot  quot                              quot details quot                      quot  attributes quot                                quot  values quot    quot  quot                              quot details quot                      quot  attributes quot                              quot class quot    quot 4a quot                                        quot count quot    quot 2 quot                                        quot girl quot    quot  quot                                                  quot  values quot                              quot name quot                                      quot  attributes quot                                              quot type quot    quot firstname quot                                                                                          quot  values quot    quot Samantha quot                                                                      quot age quot                                      quot  attributes quot                                                        quot  values quot    quot 13 quot                                                                      quot hobby quot                                      quot  attributes quot                                                        quot  values quot    quot Fishing quot                                                                      quot hobby quot                                      quot  attributes quot                                                        quot  values quot    quot Chess quot                                                                      quot address quot                                      quot  attributes quot                                              quot current quot    quot no quot                                                                                          quot  values quot                                              quot country quot                                                      quot  attributes quot                                                                                quot  values quot    quot Australia quot                                                                                                              quot state quot                                                      quot  attributes quot                                                                                quot  values quot    quot NSW quot

User · Answer

You can use declxml  It has advanced features like multi attributes and complex nested support  You just need to write a simple processor for it  Also with the same code  you can convert back to JSON as well  It is fairly straightforward and the documentation is awesome   Link  https   declxml readthedocs io en latest index html

User · Answer

If some time you get only response code instead of all data then error like json parse will be there so u need to convert it as text  import xmltodict  data   requests get url  xpars   xmltodict parse data text  json   json dumps xpars  print json

User · Answer

My answer addresses the specific  and somewhat common  case where you don t really need to convert the entire xml to json  but what you need is to traverse access specific parts of the xml  and you need it to be fast  and simple  using json dict-like operations    Approach  For this  it is important to note that parsing an xml to etree using lxml is super fast   The slow part in most of the other answers is the second pass  traversing the etree structure  usually in python-land   converting it to json   Which leads me to the approach I found best for this case  parsing the xml using lxml  and then wrapping the etree nodes  lazily   providing them with a dict-like interface   Code  Here s the code   from collections import Mapping import lxml etree  class ETreeDictWrapper Mapping        def   init   self  elem  attr prefix        list tags                self elem   elem         self attr prefix   attr prefix         self list tags   list tags      def  wrap self  e           if isinstance e  basestring               return e         if len e     0 and len e attrib     0              return e text         return type self               e              attr prefix   self attr prefix              list tags   self list tags                 def   getitem   self  key           if key startswith self attr prefix               return self elem attrib key len self attr prefix             else              subelems     e for e in self elem iterchildren   if e tag    key               if len subelems   gt  1 or key in self list tags                  return   self  wrap x  for x in subelems               elif len subelems     1                  return self  wrap subelems 0               else                  raise KeyError key       def   iter   self           return iter set  k tag for k in self elem                        set  self attr prefix   k for k in self elem attrib         def   len   self           return len self elem    len self elem attrib         defining   contains   is not necessary  but improves speed     def   contains   self  key           if key startswith self attr prefix               return key len self attr prefix    in self elem attrib         else              return any  e tag    key for e in self elem iterchildren       def xml to dictlike xmlstr  attr prefix        list tags            t   lxml etree fromstring xmlstr      return ETreeDictWrapper          t          attr prefix                list tags   set list tags           This implementation is not complete  e g   it doesn t cleanly support cases where an element has both text and attributes  or both text and children  only because I didn t need it when I wrote it      It should be easy to improve it  though   Speed  In my specific use case  where I needed to only process specific elements of the xml  this approach gave a suprising and striking speedup by a factor of 70     compared to using  Martin Blech s xmltodict and then traversing the dict directly   Bonus  As a bonus  since our structure is already dict-like  we get another alternative implementation of xml2json for free  We just need to pass our dict-like structure to json dumps   Something like   def xml to json xmlstr    kwargs       x   xml to dictlike xmlstr    kwargs      return json dumps x    If your xml includes attributes  you d need to use some alphanumeric attr prefix  e g   ATTR     to ensure the keys are valid json keys   I haven t benchmarked this part

User · Answer

jsonpickle or if you re using feedparser  you can try feed parser to json py

User · Answer

There is a method to transport XML-based markup as JSON which allows it to be losslessly converted back to its original form  See http   jsonml org     It s a kind of XSLT of JSON  I hope you find it helpful

User · Answer

Well  probably the simplest way is just parse the XML into dictionaries and then serialize that with simplejson

User · Answer

I published one on github a while back   https   github com davlee1972 xml to json This converter is written in Python and will convert one or more XML files into JSON   JSONL files It requires a XSD schema file to figure out nested json structures  dictionaries vs lists  and json equivalent data types  python xml to json py -x PurchaseOrder xsd PurchaseOrder xml  INFO - 2018-03-20 11 10 24 - Parsing XML Files   INFO - 2018-03-20 11 10 24 - Processing 1 files INFO - 2018-03-20 11 10 24 - Parsing files in the following order  INFO - 2018-03-20 11 10 24 -   PurchaseOrder xml   DEBUG - 2018-03-20 11 10 24 - Generating schema from PurchaseOrder xsd DEBUG - 2018-03-20 11 10 24 - Parsing PurchaseOrder xml DEBUG - 2018-03-20 11 10 24 - Writing to file PurchaseOrder json DEBUG - 2018-03-20 11 10 24 - Completed PurchaseOrder xml  I also have a follow up xml to parquet converter that works in a similar fashion https   github com blackrock xml to parquet

User · Answer

I published one on github a while back   https   github com davlee1972 xml to json This converter is written in Python and will convert one or more XML files into JSON   JSONL files It requires a XSD schema file to figure out nested json structures  dictionaries vs lists  and json equivalent data types  python xml to json py -x PurchaseOrder xsd PurchaseOrder xml  INFO - 2018-03-20 11 10 24 - Parsing XML Files   INFO - 2018-03-20 11 10 24 - Processing 1 files INFO - 2018-03-20 11 10 24 - Parsing files in the following order  INFO - 2018-03-20 11 10 24 -   PurchaseOrder xml   DEBUG - 2018-03-20 11 10 24 - Generating schema from PurchaseOrder xsd DEBUG - 2018-03-20 11 10 24 - Parsing PurchaseOrder xml DEBUG - 2018-03-20 11 10 24 - Writing to file PurchaseOrder json DEBUG - 2018-03-20 11 10 24 - Completed PurchaseOrder xml  I also have a follow up xml to parquet converter that works in a similar fashion https   github com blackrock xml to parquet

User · Answer

I found for simple XML snips  use regular expression would save troubles   For example      lt user gt  lt name gt Happy Man lt  name gt     lt  user gt  import re names   re findall r  lt name gt   w   lt   name gt    xml string    do some thing to names   To do it by XML parsing  as  Dan said  there is not one-for-all solution because the data is different  My suggestion is to use lxml  Although not finished to json  lxml objectify give quiet good results    gt  gt  gt  from lxml import objectify  gt  gt  gt  root   objectify fromstring          lt root xmlns xsi  http   www w3 org 2001 XMLSchema-instance  gt         lt a attr1  foo  attr2  bar  gt 1 lt  a gt         lt a gt 1 2 lt  a gt         lt b gt 1 lt  b gt         lt b gt true lt  b gt         lt c gt what  lt  c gt         lt d xsi nil  true   gt       lt  root gt             gt  gt  gt  print str root   root   None  ObjectifiedElement      a   1  IntElement          attr1    foo          attr2    bar      a   1 2  FloatElement      b   1  IntElement      b   True  BoolElement      c    what    StringElement      d   None  NoneElement          xsi nil    true

User · Answer

I d suggest not going for a direct conversion  Convert XML to an object  then from the object to JSON   In my opinion  this gives a cleaner definition of how the XML and JSON correspond   It takes time to get right and you may even write tools to help you with generating some of it  but it would look roughly like this   class Channel    def   init   self      self items          self title         def from xml  self  xml node        self title   xml node xpath  title text     0      for x in xml node xpath  item          item   Item         item from xml  x         self items append  item      def to json  self        retval          retval  title     title     retval  items            for x in items        retval append  x to json         return retval  class Item    def   init   self              def from xml  self  xml node               def to json  self

User · Answer

You may want to have a look at http   designtheory org library extrep designdb-1 0 pdf  This project starts off with an XML to JSON conversion of a large library of XML files  There was much research done in the conversion  and the most simple intuitive XML -  JSON mapping was produced  it is described early in the document   In summary  convert everything to a JSON object  and put repeating blocks as a list of objects   objects meaning key value pairs  dictionary in Python  hashmap in Java  object in JavaScript   There is no mapping back to XML to get an identical document  the reason is  it is unknown whether a key value pair was an attribute or an  lt key gt value lt  key gt   therefore that information is lost    If you ask me  attributes are a hack to start  then again they worked well for HTML

User · Answer

Well  probably the simplest way is just parse the XML into dictionaries and then serialize that with simplejson

User · Answer

While the built-in libs for XML parsing are quite good I am partial to lxml   But for parsing RSS feeds  I d recommend Universal Feed Parser  which can also parse Atom  Its main advantage is that it can digest even most malformed feeds   Python 2 6 already includes a JSON parser  but a newer version with improved speed is available as simplejson   With these tools building your app shouldn t be that difficult

User · Answer

To anyone that may still need this  Here s a newer  simple code to do this conversion   from xml etree import ElementTree as ET  xml      ET parse  FILE NAME xml   parsed   parseXmlToJson xml    def parseXmlToJson xml     response         for child in list xml       if len list child    gt  0        response child tag    parseXmlToJson child      else        response child tag    child text or           one-liner equivalent       response child tag    parseXmlToJson child  if len list child    gt  0 else child text or       return response

User · Answer

When I do anything with XML in python I almost always use the lxml package   I suspect that most people use lxml   You could use xmltodict but you will have to pay the penalty of parsing the XML again   To convert XML to json with lxml you    Parse XML document with lxml Convert lxml to a dict Convert list to json   I use the following class in my projects   Use the toJson method   from lxml import etree  import json   class Element              Wrapper on the etree Element class   Extends functionality to output element     as a dictionary               def   init   self  element                        param  element a normal etree Element instance                     self element   element      def toDict self                       Returns the element as a dictionary   This includes all child elements                      rval                 self element tag                     attributes   dict self element items                                      for child in self element              rval self element tag  update Element child  toDict            return rval   class XmlDocument              Wraps lxml to provide          - cleaner access to some common lxml etree functions         - converter from XML to dict         - converter from XML to json             def   init   self  xml     lt empty  gt    filename None                       There are two ways to initialize the XmlDocument contents              - String             - File          You don t have to initialize the XmlDocument during instantiation         though   You can do it later with the  set  method   If you choose to         initialize later XmlDocument will be initialized with   lt empty  gt              param  xml Set this argument if you want to parse from a string           param  filename Set this argument if you want to parse from a file                      self set xml  filename        def set self  xml None  filename None                       Use this to set or reset the contents of the XmlDocument            param  xml Set this argument if you want to parse from a string           param  filename Set this argument if you want to parse from a file                      if filename is not None              self tree   etree parse filename              self root   self tree getroot           else              self root   etree fromstring xml              self tree   etree ElementTree self root        def dump self           etree dump self root       def getXml self                       return document as a string                     return etree tostring self root       def xpath self  xpath                       Return elements that match the given xpath            param  xpath                     return self tree xpath xpath        def nodes self                       Return all elements                     return self root iter           def toDict self                       Convert to a python dictionary                     return Element self root  toDict        def toJson self  indent None                       Convert to JSON                     return json dumps self toDict    indent indent    if   name         main         xml     lt system gt       lt product gt           lt demod gt               lt frequency value  2 215  units  MHz  gt                   lt blah value  1   gt               lt  frequency gt           lt  demod gt       lt  product gt   lt  system gt          doc   XmlDocument xml      print doc toJson indent 4    The output from the built in main is          system              attributes                 product                  attributes                     demod                      attributes                         frequency                          attributes                              units    MHz                             value    2 215                                               blah                              attributes                                  value    1                                                                                                      Which is a transformation of this xml    lt system gt       lt product gt           lt demod gt               lt frequency value  2 215  units  MHz  gt                   lt blah value  1   gt               lt  frequency gt           lt  demod gt       lt  product gt   lt  system gt

User · Answer

This stuff here is actively maintained and so far is my favorite  xml2json in python

User · Answer

While the built-in libs for XML parsing are quite good I am partial to lxml   But for parsing RSS feeds  I d recommend Universal Feed Parser  which can also parse Atom  Its main advantage is that it can digest even most malformed feeds   Python 2 6 already includes a JSON parser  but a newer version with improved speed is available as simplejson   With these tools building your app shouldn t be that difficult

User · Answer

There is a method to transport XML-based markup as JSON which allows it to be losslessly converted back to its original form  See http   jsonml org     It s a kind of XSLT of JSON  I hope you find it helpful

User · Answer

My answer addresses the specific  and somewhat common  case where you don t really need to convert the entire xml to json  but what you need is to traverse access specific parts of the xml  and you need it to be fast  and simple  using json dict-like operations    Approach  For this  it is important to note that parsing an xml to etree using lxml is super fast   The slow part in most of the other answers is the second pass  traversing the etree structure  usually in python-land   converting it to json   Which leads me to the approach I found best for this case  parsing the xml using lxml  and then wrapping the etree nodes  lazily   providing them with a dict-like interface   Code  Here s the code   from collections import Mapping import lxml etree  class ETreeDictWrapper Mapping        def   init   self  elem  attr prefix        list tags                self elem   elem         self attr prefix   attr prefix         self list tags   list tags      def  wrap self  e           if isinstance e  basestring               return e         if len e     0 and len e attrib     0              return e text         return type self               e              attr prefix   self attr prefix              list tags   self list tags                 def   getitem   self  key           if key startswith self attr prefix               return self elem attrib key len self attr prefix             else              subelems     e for e in self elem iterchildren   if e tag    key               if len subelems   gt  1 or key in self list tags                  return   self  wrap x  for x in subelems               elif len subelems     1                  return self  wrap subelems 0               else                  raise KeyError key       def   iter   self           return iter set  k tag for k in self elem                        set  self attr prefix   k for k in self elem attrib         def   len   self           return len self elem    len self elem attrib         defining   contains   is not necessary  but improves speed     def   contains   self  key           if key startswith self attr prefix               return key len self attr prefix    in self elem attrib         else              return any  e tag    key for e in self elem iterchildren       def xml to dictlike xmlstr  attr prefix        list tags            t   lxml etree fromstring xmlstr      return ETreeDictWrapper          t          attr prefix                list tags   set list tags           This implementation is not complete  e g   it doesn t cleanly support cases where an element has both text and attributes  or both text and children  only because I didn t need it when I wrote it      It should be easy to improve it  though   Speed  In my specific use case  where I needed to only process specific elements of the xml  this approach gave a suprising and striking speedup by a factor of 70     compared to using  Martin Blech s xmltodict and then traversing the dict directly   Bonus  As a bonus  since our structure is already dict-like  we get another alternative implementation of xml2json for free  We just need to pass our dict-like structure to json dumps   Something like   def xml to json xmlstr    kwargs       x   xml to dictlike xmlstr    kwargs      return json dumps x    If your xml includes attributes  you d need to use some alphanumeric attr prefix  e g   ATTR     to ensure the keys are valid json keys   I haven t benchmarked this part

User · Answer

check out lxml2json  disclosure  I wrote it    https   github com rparelius lxml2json  it s very fast  lightweight  only requires lxml   and one advantage is that you have control over whether certain elements are converted to lists or dicts

User · Answer

Well  probably the simplest way is just parse the XML into dictionaries and then serialize that with simplejson

User · Answer

You may want to have a look at http   designtheory org library extrep designdb-1 0 pdf  This project starts off with an XML to JSON conversion of a large library of XML files  There was much research done in the conversion  and the most simple intuitive XML -  JSON mapping was produced  it is described early in the document   In summary  convert everything to a JSON object  and put repeating blocks as a list of objects   objects meaning key value pairs  dictionary in Python  hashmap in Java  object in JavaScript   There is no mapping back to XML to get an identical document  the reason is  it is unknown whether a key value pair was an attribute or an  lt key gt value lt  key gt   therefore that information is lost    If you ask me  attributes are a hack to start  then again they worked well for HTML

User · Answer

Here s the code I built for that  There s no parsing of the contents  just plain conversion   from xml dom import minidom import simplejson as json def parse element element       dict data   dict       if element nodeType    element TEXT NODE          dict data  data     element data     if element nodeType not in  element TEXT NODE  element DOCUMENT NODE                                   element DOCUMENT TYPE NODE           for item in element attributes items                dict data item 0     item 1      if element nodeType not in  element TEXT NODE  element DOCUMENT TYPE NODE           for child in element childNodes              child name  child dict   parse element child              if child name in dict data                  try                      dict data child name  append child dict                  except AttributeError                      dict data child name     dict data child name   child dict              else                  dict data child name    child dict      return element nodeName  dict data  if   name         main         dom   minidom parse  data xml       f   open  data json    w       f write json dumps parse element dom   sort keys True  indent 4       f close

[python] Converting XML to JSON using Python?

Examples related to python

Examples related to json

Examples related to xml

Examples related to converter