[python] Parsing XML in Python using ElementTree example

I'm having a hard time finding a good, basic example of how to parse XML in python using Element Tree. From what I can find, this appears to be the easiest library to use for parsing XML. Here is a sample of the XML I'm working with:

<timeSeriesResponse>
    <queryInfo>
        <locationParam>01474500</locationParam>
        <variableParam>99988</variableParam>
        <timeParam>
            <beginDateTime>2009-09-24T15:15:55.271</beginDateTime>
            <endDateTime>2009-11-23T15:15:55.271</endDateTime>
        </timeParam>
     </queryInfo>
     <timeSeries name="NWIS Time Series Instantaneous Values">
         <values count="2876">
            <value dateTime="2009-09-24T15:30:00.000-04:00" qualifiers="P">550</value>
            <value dateTime="2009-09-24T16:00:00.000-04:00" qualifiers="P">419</value>
            <value dateTime="2009-09-24T16:30:00.000-04:00" qualifiers="P">370</value>
            .....
         </values>
     </timeSeries>
</timeSeriesResponse>

I am able to do what I need, using a hard-coded method. But I need my code to be a bit more dynamic. Here is what worked:

tree = ET.parse(sample.xml)
doc = tree.getroot()

timeseries =  doc[1]
values = timeseries[2]

print child.attrib['dateTime'], child.text
#prints 2009-09-24T15:30:00.000-04:00, 550

Here are a couple of things I've tried, none of them worked, reporting that they couldn't find timeSeries (or anything else I tried):

tree = ET.parse(sample.xml)
tree.find('timeSeries')

tree = ET.parse(sample.xml)
doc = tree.getroot()
doc.find('timeSeries')

Basically, I want to load the xml file, search for the timeSeries tag, and iterate through the value tags, returning the dateTime and the value of the tag itself; everything I'm doing in the above example, but not hard coding the sections of xml I'm interested in. Can anyone point me to some examples, or give me some suggestions on how to work through this?


Thanks for all the help. Using both of the below suggestions worked on the sample file I provided, however, they didn't work on the full file. Here is the error I get from the real file when I use Ed Carrel's method:

 (<type 'exceptions.AttributeError'>, AttributeError("'NoneType' object has no attribute 'attrib'",), <traceback object at 0x011EFB70>)

I figured there was something in the real file it didn't like, so I incremently removed things until it worked. Here are the lines that I changed:

originally: <timeSeriesResponse xsi:schemaLocation="a URL I removed" xmlns="a URL I removed" xmlns:xsi="a URL I removed">
 changed to: <timeSeriesResponse>

 originally:  <sourceInfo xsi:type="SiteInfoType">
 changed to: <sourceInfo>

 originally: <geogLocation xsi:type="LatLonPointType" srs="EPSG:4326">
 changed to: <geogLocation>

Removing the attributes that have 'xsi:...' fixed the problem. Is the 'xsi:...' not valid XML? It will be hard for me to remove these programmatically. Any suggested work arounds?

Here is the full XML file: http://www.sendspace.com/file/lofcpt


When I originally asked this question, I was unaware of namespaces in XML. Now that I know what's going on, I don't have to remove the "xsi" attributes, which are the namespace declarations. I just include them in my xpath searches. See this page for more info on namespaces in lxml.

This question is related to python xml elementtree

The answer is


If I understand your question correctly:

for elem in doc.findall('timeSeries/values/value'):
    print elem.get('dateTime'), elem.text

or if you prefer (and if there is only one occurrence of timeSeries/values:

values = doc.find('timeSeries/values')
for value in values:
    print value.get('dateTime'), elem.text

The findall() method returns a list of all matching elements, whereas find() returns only the first matching element. The first example loops over all the found elements, the second loops over the child elements of the values element, in this case leading to the same result.

I don't see where the problem with not finding timeSeries comes from however. Maybe you just forgot the getroot() call? (note that you don't really need it because you can work from the elementtree itself too, if you change the path expression to for example /timeSeriesResponse/timeSeries/values or //timeSeries/values)


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to elementtree

Use xml.etree.ElementTree to print nicely formatted xml files Convert Python ElementTree to string Parsing XML with namespace in Python via 'ElementTree' ParseError: not well-formed (invalid token) using cElementTree Parsing XML in Python using ElementTree example