[python] How to convert XML to JSON in Python?

Possible Duplicate:
Converting XML to JSON using Python?

I'm doing some work on App Engine and I need to convert an XML document being retrieved from a remote server into an equivalent JSON object.

I'm using xml.dom.minidom to parse the XML data being returned by urlfetch. I'm also trying to use django.utils.simplejson to convert the parsed XML document into JSON. I'm completely at a loss as to how to hook the two together. Below is the code I'm tinkering with:

from xml.dom import minidom
from django.utils import simplejson as json

#pseudo code that returns actual xml data as a string from remote server. 
result = urlfetch.fetch(url,'','get');

dom = minidom.parseString(result.content)
json = simplejson.load(dom)

self.response.out.write(json)

This question is related to python xml json

The answer is


Soviut's advice for lxml objectify is good. With a specially subclassed simplejson, you can turn an lxml objectify result into json.

import simplejson as json
import lxml

class objectJSONEncoder(json.JSONEncoder):
  """A specialized JSON encoder that can handle simple lxml objectify types
      >>> from lxml import objectify
      >>> obj = objectify.fromstring("<Book><price>1.50</price><author>W. Shakespeare</author></Book>")       
      >>> objectJSONEncoder().encode(obj)
      '{"price": 1.5, "author": "W. Shakespeare"}'       
 """


    def default(self,o):
        if isinstance(o, lxml.objectify.IntElement):
            return int(o)
        if isinstance(o, lxml.objectify.NumberElement) or isinstance(o, lxml.objectify.FloatElement):
            return float(o)
        if isinstance(o, lxml.objectify.ObjectifiedDataElement):
            return str(o)
        if hasattr(o, '__dict__'):
            #For objects with a __dict__, return the encoding of the __dict__
            return o.__dict__
        return json.JSONEncoder.default(self, o)

See the docstring for example of usage, essentially you pass the result of lxml objectify to the encode method of an instance of objectJSONEncoder

Note that Koen's point is very valid here, the solution above only works for simply nested xml and doesn't include the name of root elements. This could be fixed.

I've included this class in a gist here: http://gist.github.com/345559


In general, you want to go from XML to regular objects of your language (since there are usually reasonable tools to do this, and it's the harder conversion). And then from Plain Old Object produce JSON -- there are tools for this, too, and it's a quite simple serialization (since JSON is "Object Notation", natural fit for serializing objects). I assume Python has its set of tools.


I think the XML format can be so diverse that it's impossible to write a code that could do this without a very strict defined XML format. Here is what I mean:

<persons>
    <person>
        <name>Koen Bok</name>
        <age>26</age>
    </person>
    <person>
        <name>Plutor Heidepeen</name>
        <age>33</age>
    </person>
</persons>

Would become

{'persons': [
    {'name': 'Koen Bok', 'age': 26},
    {'name': 'Plutor Heidepeen', 'age': 33}]
}

But what would this be:

<persons>
    <person name="Koen Bok">
        <locations name="defaults">
            <location long=123 lat=384 />
        </locations>
    </person>
</persons>

See what I mean?

Edit: just found this article: http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html


Jacob Smullyan wrote a utility called pesterfish which uses effbot's ElementTree to convert XML to JSON.


One possibility would be to use Objectify or ElementTree from the lxml module. An older version ElementTree is also available in the python xml.etree module as well. Either of these will get your xml converted to Python objects which you can then use simplejson to serialize the object to JSON.

While this may seem like a painful intermediate step, it starts making more sense when you're dealing with both XML and normal Python objects.


Jacob Smullyan wrote a utility called pesterfish which uses effbot's ElementTree to convert XML to JSON.


One possibility would be to use Objectify or ElementTree from the lxml module. An older version ElementTree is also available in the python xml.etree module as well. Either of these will get your xml converted to Python objects which you can then use simplejson to serialize the object to JSON.

While this may seem like a painful intermediate step, it starts making more sense when you're dealing with both XML and normal Python objects.


I wrote a small command-line based Python script based on pesterfesh that does exactly this:

https://github.com/hay/xml2json


xmltodict (full disclosure: I wrote it) can help you convert your XML to a dict+list+string structure, following this "standard". It is Expat-based, so it's very fast and doesn't need to load the whole XML tree in memory.

Once you have that data structure, you can serialize it to JSON:

import xmltodict, json

o = xmltodict.parse('<e> <a>text</a> <a>text</a> </e>')
json.dumps(o) # '{"e": {"a": ["text", "text"]}}'

One possibility would be to use Objectify or ElementTree from the lxml module. An older version ElementTree is also available in the python xml.etree module as well. Either of these will get your xml converted to Python objects which you can then use simplejson to serialize the object to JSON.

While this may seem like a painful intermediate step, it starts making more sense when you're dealing with both XML and normal Python objects.


Jacob Smullyan wrote a utility called pesterfish which uses effbot's ElementTree to convert XML to JSON.


In general, you want to go from XML to regular objects of your language (since there are usually reasonable tools to do this, and it's the harder conversion). And then from Plain Old Object produce JSON -- there are tools for this, too, and it's a quite simple serialization (since JSON is "Object Notation", natural fit for serializing objects). I assume Python has its set of tools.


One possibility would be to use Objectify or ElementTree from the lxml module. An older version ElementTree is also available in the python xml.etree module as well. Either of these will get your xml converted to Python objects which you can then use simplejson to serialize the object to JSON.

While this may seem like a painful intermediate step, it starts making more sense when you're dealing with both XML and normal Python objects.


Jacob Smullyan wrote a utility called pesterfish which uses effbot's ElementTree to convert XML to JSON.


I think the XML format can be so diverse that it's impossible to write a code that could do this without a very strict defined XML format. Here is what I mean:

<persons>
    <person>
        <name>Koen Bok</name>
        <age>26</age>
    </person>
    <person>
        <name>Plutor Heidepeen</name>
        <age>33</age>
    </person>
</persons>

Would become

{'persons': [
    {'name': 'Koen Bok', 'age': 26},
    {'name': 'Plutor Heidepeen', 'age': 33}]
}

But what would this be:

<persons>
    <person name="Koen Bok">
        <locations name="defaults">
            <location long=123 lat=384 />
        </locations>
    </person>
</persons>

See what I mean?

Edit: just found this article: http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html


xmltodict (full disclosure: I wrote it) can help you convert your XML to a dict+list+string structure, following this "standard". It is Expat-based, so it's very fast and doesn't need to load the whole XML tree in memory.

Once you have that data structure, you can serialize it to JSON:

import xmltodict, json

o = xmltodict.parse('<e> <a>text</a> <a>text</a> </e>')
json.dumps(o) # '{"e": {"a": ["text", "text"]}}'

I wrote a small command-line based Python script based on pesterfesh that does exactly this:

https://github.com/hay/xml2json


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to json

Use NSInteger as array index Uncaught SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) HTTP POST with Json on Body - Flutter/Dart Importing json file in TypeScript json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 190) Angular 5 Service to read local .json file How to import JSON File into a TypeScript file? Use Async/Await with Axios in React.js Uncaught SyntaxError: Unexpected token u in JSON at position 0 how to remove json object key and value.?