[python] How to use XPath in Python?

What are the libraries that support XPath? Is there a full implementation? How is the library used? Where is its website?

This question is related to python xml dom xpath python-2.x

The answer is


The latest version of elementtree supports XPath pretty well. Not being an XPath expert I can't say for sure if the implementation is full but it has satisfied most of my needs when working in Python. I've also use lxml and PyXML and I find etree nice because it's a standard module.

NOTE: I've since found lxml and for me it's definitely the best XML lib out there for Python. It does XPath nicely as well (though again perhaps not a full implementation).


You can use:

PyXML:

from xml.dom.ext.reader import Sax2
from xml import xpath
doc = Sax2.FromXmlFile('foo.xml').documentElement
for url in xpath.Evaluate('//@Url', doc):
  print url.value

libxml2:

import libxml2
doc = libxml2.parseFile('foo.xml')
for url in doc.xpathEval('//@Url'):
  print url.content

Sounds like an lxml advertisement in here. ;) ElementTree is included in the std library. Under 2.6 and below its xpath is pretty weak, but in 2.7+ much improved:

import xml.etree.ElementTree as ET
root = ET.parse(filename)
result = ''

for elem in root.findall('.//child/grandchild'):
    # How to make decisions based on attributes even in 2.6:
    if elem.attrib.get('name') == 'foo':
        result = elem.text
        break

Another library is 4Suite: http://sourceforge.net/projects/foursuite/

I do not know how spec-compliant it is. But it has worked very well for my use. It looks abandoned.


If you are going to need it for html:

import lxml.html as html
root  = html.fromstring(string)
root.xpath('//meta')

The lxml package supports xpath. It seems to work pretty well, although I've had some trouble with the self:: axis. There's also Amara, but I haven't used it personally.


Use LXML. LXML uses the full power of libxml2 and libxslt, but wraps them in more "Pythonic" bindings than the Python bindings that are native to those libraries. As such, it gets the full XPath 1.0 implementation. Native ElemenTree supports a limited subset of XPath, although it may be good enough for your needs.


You can use the simple soupparser from lxml

Example:

from lxml.html.soupparser import fromstring

tree = fromstring("<a>Find me!</a>")
print tree.xpath("//a/text()")

If you want to have the power of XPATH combined with the ability to also use CSS at any point you can use parsel:

>>> from parsel import Selector
>>> sel = Selector(text=u"""<html>
        <body>
            <h1>Hello, Parsel!</h1>
            <ul>
                <li><a href="http://example.com">Link 1</a></li>
                <li><a href="http://scrapy.org">Link 2</a></li>
            </ul
        </body>
        </html>""")
>>>
>>> sel.css('h1::text').extract_first()
'Hello, Parsel!'
>>> sel.xpath('//h1/text()').extract_first()
'Hello, Parsel!'

libxml2 has a number of advantages:

  1. Compliance to the spec
  2. Active development and a community participation
  3. Speed. This is really a python wrapper around a C implementation.
  4. Ubiquity. The libxml2 library is pervasive and thus well tested.

Downsides include:

  1. Compliance to the spec. It's strict. Things like default namespace handling are easier in other libraries.
  2. Use of native code. This can be a pain depending on your how your application is distributed / deployed. RPMs are available that ease some of this pain.
  3. Manual resource handling. Note in the sample below the calls to freeDoc() and xpathFreeContext(). This is not very Pythonic.

If you are doing simple path selection, stick with ElementTree ( which is included in Python 2.5 ). If you need full spec compliance or raw speed and can cope with the distribution of native code, go with libxml2.

Sample of libxml2 XPath Use


import libxml2

doc = libxml2.parseFile("tst.xml")
ctxt = doc.xpathNewContext()
res = ctxt.xpathEval("//*")
if len(res) != 2:
    print "xpath query: wrong node set size"
    sys.exit(1)
if res[0].name != "doc" or res[1].name != "foo":
    print "xpath query: wrong node set value"
    sys.exit(1)
doc.freeDoc()
ctxt.xpathFreeContext()

Sample of ElementTree XPath Use


from elementtree.ElementTree import ElementTree
mydoc = ElementTree(file='tst.xml')
for e in mydoc.findall('/foo/bar'):
    print e.get('title').text


PyXML works well.

You didn't say what platform you're using, however if you're on Ubuntu you can get it with sudo apt-get install python-xml. I'm sure other Linux distros have it as well.

If you're on a Mac, xpath is already installed but not immediately accessible. You can set PY_USE_XMLPLUS in your environment or do it the Python way before you import xml.xpath:

if sys.platform.startswith('darwin'):
    os.environ['PY_USE_XMLPLUS'] = '1'

In the worst case you may have to build it yourself. This package is no longer maintained but still builds fine and works with modern 2.x Pythons. Basic docs are here.


Another option is py-dom-xpath, it works seamlessly with minidom and is pure Python so works on appengine.

import xpath
xpath.find('//item', doc)

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt

Examples related to dom

How do you set the document title in React? How to find if element with specific id exists or not Cannot read property 'style' of undefined -- Uncaught Type Error adding text to an existing text element in javascript via DOM Violation Long running JavaScript task took xx ms How to get `DOM Element` in Angular 2? Angular2, what is the correct way to disable an anchor element? React.js: How to append a component on click? Detect click outside React component DOM element to corresponding vue.js component

Examples related to xpath

Xpath: select div that contains class AND whose specific child element contains text XPath: difference between dot and text() How to set "value" to input web element using selenium? How to click a href link using Selenium XPath: Get parent node from child node What is the difference between absolute and relative xpaths? Which is preferred in Selenium automation testing? How to use XPath preceding-sibling correctly Selenium and xPath - locating a link by containing text How to verify an XPath expression in Chrome Developers tool or Firefox's Firebug? Concatenate multiple node values in xpath

Examples related to python-2.x

Combine several images horizontally with Python How to print variables without spaces between values How to return dictionary keys as a list in Python? How to add an element to the beginning of an OrderedDict? How to read a CSV file from a URL with Python? Malformed String ValueError ast.literal_eval() with String representation of Tuple Relative imports for the billionth time How do you use subprocess.check_output() in Python? write() versus writelines() and concatenated strings How to select a directory and store the location using tkinter in Python