Parsing XML in Python using ElementTree example

Question

I m having a hard time finding a good  basic example of how to parse XML in python using Element Tree  From what I can find  this appears to be the easiest library to use for parsing XML  Here is a sample of the XML I m working with    lt timeSeriesResponse gt       lt queryInfo gt           lt locationParam gt 01474500 lt  locationParam gt           lt variableParam gt 99988 lt  variableParam gt           lt timeParam gt               lt beginDateTime gt 2009-09-24T15 15 55 271 lt  beginDateTime gt               lt endDateTime gt 2009-11-23T15 15 55 271 lt  endDateTime gt           lt  timeParam gt        lt  queryInfo gt        lt timeSeries name  NWIS Time Series Instantaneous Values  gt            lt values count  2876  gt               lt value dateTime  2009-09-24T15 30 00 000-04 00  qualifiers  P  gt 550 lt  value gt               lt value dateTime  2009-09-24T16 00 00 000-04 00  qualifiers  P  gt 419 lt  value gt               lt value dateTime  2009-09-24T16 30 00 000-04 00  qualifiers  P  gt 370 lt  value gt                              lt  values gt        lt  timeSeries gt   lt  timeSeriesResponse gt    I am able to do what I need  using a hard-coded method  But I need my code to be a bit more dynamic  Here is what worked   tree   ET parse sample xml  doc   tree getroot    timeseries    doc 1  values   timeseries 2   print child attrib  dateTime    child text  prints 2009-09-24T15 30 00 000-04 00  550   Here are a couple of things I ve tried  none of them worked  reporting that they couldn t find timeSeries  or anything else I tried    tree   ET parse sample xml  tree find  timeSeries    tree   ET parse sample xml  doc   tree getroot   doc find  timeSeries     Basically  I want to load the xml file  search for the timeSeries tag  and iterate through the value tags  returning the dateTime and the value of the tag itself  everything I m doing in the above example  but not hard coding the sections of xml I m interested in  Can anyone point me to some examples  or give me some suggestions on how to work through this     Thanks for all the help  Using both of the below suggestions worked on the sample file I provided  however  they didn t work on the full file  Here is the error I get from the real file when I use Ed Carrel s method      lt type  exceptions AttributeError  gt   AttributeError   NoneType  object has no attribute  attrib       lt traceback object at 0x011EFB70 gt     I figured there was something in the real file it didn t like  so I incremently removed things until it worked  Here are the lines that I changed   originally   lt timeSeriesResponse xsi schemaLocation  a URL I removed  xmlns  a URL I removed  xmlns xsi  a URL I removed  gt   changed to   lt timeSeriesResponse gt    originally    lt sourceInfo xsi type  SiteInfoType  gt   changed to   lt sourceInfo gt    originally   lt geogLocation xsi type  LatLonPointType  srs  EPSG 4326  gt   changed to   lt geogLocation gt    Removing the attributes that have  xsi      fixed the problem  Is the  xsi      not valid XML  It will be hard for me to remove these programmatically  Any suggested work arounds   Here is the full XML file  http   www sendspace com file lofcpt    When I originally asked this question  I was unaware of namespaces in XML  Now that I know what s going on  I don t have to remove the  xsi  attributes  which are the namespace declarations  I just include them in my xpath searches  See this page for more info on namespaces in lxml

User · Accepted Answer

So I have ElementTree 1 2 6 on my box now  and ran the following code against the XML chunk you posted    import elementtree ElementTree as ET  tree   ET parse  test xml   doc   tree getroot   thingy   doc find  timeSeries    print thingy attrib   and got the following back     name    NWIS Time Series Instantaneous Values     It appears to have found the timeSeries element without needing to use numerical indices   What would be useful now is knowing what you mean when you say  it doesn t work   Since it works for me given the same input  it is unlikely that ElementTree is broken in some obvious way  Update your question with any error messages  backtraces  or anything you can provide to help us help you

User · Answer

If I understand your question correctly   for elem in doc findall  timeSeries values value        print elem get  dateTime    elem text   or if you prefer  and if there is only one occurrence of timeSeries values   values   doc find  timeSeries values   for value in values      print value get  dateTime    elem text   The findall   method returns a list of all matching elements  whereas find   returns only the first matching element  The first example loops over all the found elements  the second loops over the child elements of the values element  in this case leading to the same result   I don t see where the problem with not finding timeSeries comes from however  Maybe you just forgot the getroot   call   note that you don t really need it because you can work from the elementtree itself too  if you change the path expression to for example  timeSeriesResponse timeSeries values or   timeSeries values

[python] Parsing XML in Python using ElementTree example

Examples related to python

Examples related to xml

Examples related to elementtree