How to use XPath in Python

Question

What are the libraries that support XPath  Is there a full implementation  How is the library used  Where is its website

User · Answer

Another option is py-dom-xpath  it works seamlessly with minidom and is pure Python so works on appengine   import xpath xpath find    item   doc

User · Answer

The lxml package supports xpath   It seems to work pretty well  although I ve had some trouble with the self   axis   There s also Amara  but I haven t used it personally

User · Answer

libxml2 has a number of advantages    Compliance to the spec Active development and a community participation  Speed  This is really a python wrapper around a C implementation   Ubiquity  The libxml2 library is pervasive and thus well tested    Downsides include    Compliance to the spec  It s strict  Things like default namespace handling are easier in other libraries  Use of native code  This can be a pain depending on your how your application is distributed   deployed  RPMs are available that ease some of this pain  Manual resource handling  Note in the sample below the calls to freeDoc   and xpathFreeContext    This is not very Pythonic    If you are doing simple path selection  stick with ElementTree   which is included in Python 2 5    If you need full spec compliance or raw speed and can cope with the distribution of native code  go with libxml2   Sample of libxml2 XPath Use    import libxml2  doc   libxml2 parseFile  tst xml   ctxt   doc xpathNewContext   res   ctxt xpathEval        if len res     2      print  xpath query  wrong node set size      sys exit 1  if res 0  name     doc  or res 1  name     foo       print  xpath query  wrong node set value      sys exit 1  doc freeDoc   ctxt xpathFreeContext     Sample of ElementTree XPath Use    from elementtree ElementTree import ElementTree mydoc   ElementTree file  tst xml   for e in mydoc findall   foo bar        print e get  title   text

User · Answer

The latest version of elementtree supports XPath pretty well   Not being an XPath expert I can t say for sure if the implementation is full but it has satisfied most of my needs when working in Python   I ve also use lxml and PyXML and I find etree nice because it s a standard module   NOTE  I ve since found lxml and for me it s definitely the best XML lib out there for Python   It does XPath nicely as well  though again perhaps not a full implementation

User · Answer

If you want to have the power of XPATH combined with the ability to also use CSS at any point you can use parsel    gt  gt  gt  from parsel import Selector  gt  gt  gt  sel   Selector text u    lt html gt           lt body gt               lt h1 gt Hello  Parsel  lt  h1 gt               lt ul gt                   lt li gt  lt a href  http   example com  gt Link 1 lt  a gt  lt  li gt                   lt li gt  lt a href  http   scrapy org  gt Link 2 lt  a gt  lt  li gt               lt  ul          lt  body gt           lt  html gt       gt  gt  gt   gt  gt  gt  sel css  h1  text   extract first    Hello  Parsel    gt  gt  gt  sel xpath    h1 text     extract first    Hello  Parsel

User · Answer

Another library is 4Suite  http   sourceforge net projects foursuite   I do not know how spec-compliant it is  But it has worked very well for my use  It looks abandoned

User · Answer

PyXML works well     You didn t say what platform you re using  however if you re on Ubuntu you can get it with sudo apt-get install python-xml   I m sure other Linux distros have it as well     If you re on a Mac  xpath is already installed but not immediately accessible   You can set PY USE XMLPLUS in your environment or do it the Python way before you import xml xpath   if sys platform startswith  darwin        os environ  PY USE XMLPLUS      1    In the worst case you may have to build it yourself   This package is no longer maintained but still builds fine and works with modern 2 x Pythons   Basic docs are here

User · Answer

Sounds like an lxml advertisement in here      ElementTree is included in the std library   Under 2 6 and below its xpath is pretty weak  but in 2 7  much improved   import xml etree ElementTree as ET root   ET parse filename  result       for elem in root findall     child grandchild          How to make decisions based on attributes even in 2 6      if elem attrib get  name       foo           result   elem text         break

User · Answer

If you are going to need it for html   import lxml html as html root    html fromstring string  root xpath    meta

User · Answer

You can use   PyXML   from xml dom ext reader import Sax2 from xml import xpath doc   Sax2 FromXmlFile  foo xml   documentElement for url in xpath Evaluate     Url   doc     print url value   libxml2   import libxml2 doc   libxml2 parseFile  foo xml   for url in doc xpathEval     Url      print url content

User · Answer

You can use the simple soupparser from lxml  Example   from lxml html soupparser import fromstring  tree   fromstring   lt a gt Find me  lt  a gt    print tree xpath    a text

User · Answer

Use LXML  LXML uses the full power of libxml2 and libxslt  but wraps them in more  Pythonic  bindings than the Python bindings that are native to those libraries  As such  it gets the full XPath 1 0 implementation  Native ElemenTree supports a limited subset of XPath  although it may be good enough for your needs

[python] How to use XPath in Python?

Examples related to python

Examples related to xml

Examples related to dom

Examples related to xpath

Examples related to python-2.x