How to execute XPath one-liners from shell

Question

Is there a package out there  for Ubuntu and or CentOS  that has a command-line tool that can execute an XPath one-liner like foo   element attribute filename xml or foo   element attribute  lt  filename xml and return the results line by line   I m looking for something that would allow me to just apt-get install foo or yum install foo and then just works out-of-the-box  no wrappers or other adaptation necessary   Here are some examples of things that come close   Nokogiri  If I write this wrapper I could call the wrapper in the way described above      usr bin ruby  require  nokogiri   Nokogiri  XML STDIN  xpath ARGV 0   each do  row    puts row end   XML  XPath  Would work with this wrapper      usr bin perl  use strict  use warnings  use XML  XPath   my  root   XML  XPath- gt new ioref   gt   STDIN    for my  node   root- gt find  ARGV 0  - gt get nodelist      print  node- gt getData    n        xpath from XML  XPath returns too much noise  -- NODE -- and attribute    value    xml grep from XML  Twig cannot handle expressions that do not return elements  so cannot be used to extract attribute values without further processing   EDIT   echo cat   element  attribute   xmllint --shell filename xml returns noise similar to xpath   xmllint --xpath   element  attribute filename xml returns attribute    value    xmllint --xpath  string   element  attribute   filename xml returns what I want  but only for the first match   For another solution almost satisfying the question  here is an XSLT that can be used to evaluate arbitrary XPath expressions  requires dyn evaluate support in the XSLT processor     lt  xml version  1 0   gt   lt xsl stylesheet xmlns xsl  http   www w3 org 1999 XSL Transform  version  1 0      xmlns dyn  http   exslt org dynamic  extension-element-prefixes  dyn  gt     lt xsl output omit-xml-declaration  yes  indent  no  method  text   gt     lt xsl template match     gt       lt xsl for-each select  dyn evaluate  pattern   gt         lt xsl value-of select  dyn evaluate  value    gt         lt xsl value-of select    amp  10     gt       lt  xsl for-each gt      lt  xsl template gt   lt  xsl stylesheet gt    Run with xsltproc --stringparam pattern   element  attribute --stringparam value   arbitrary-xpath xslt filename xml

User · Answer

Since this project is apparently fairly new  check out https   github com jeffbr13 xq   seems to be a wrapper around lxml  but that is all you really need  and posted ad hoc solutions using lxml in other answers as well

User · Answer

It bears mentioning that nokogiri itself ships with a command line tool  which should be installed with gem install nokogiri   You might find this blog post useful

User · Answer

One package that is very likely to be installed on a system already is python-lxml  If so  this is possible without installing any extra package  python -c  quot from lxml etree import parse  from sys import stdin  print   n  join parse stdin  xpath    element  attribute     quot

User · Answer

I ve tried a couple of command line XPath utilities and when I realized I am spending too much time googling and figuring out how they work  so I wrote the simplest possible XPath parser in Python which did what I needed   The script below shows the string value if the XPath expression evaluates to a string  or shows the entire XML subnode if the result is a node      usr bin env python import sys from lxml import etree  tree   etree parse sys argv 1   xpath   sys argv 2   for e in tree xpath xpath        if isinstance e  str           print e      else          print  e text and e text strip    or etree tostring e     It uses lxml     a fast XML parser written in C which is not included in the standard python library  Install it with pip install lxml  On Linux OSX might need prefixing with sudo   Usage     python xmlcat py file xml    mynode    lxml can also accept an URL as input   python xmlcat py http   example com file xml    mynode     Extract the url attribute under an enclosure node i e   lt enclosure url  http         gt     python xmlcat py xmlcat py file xml    enclosure  url    Xpath in Google Chrome  As an unrelated side note  If by chance you want to run an XPath expression against the markup of a web page then you can do it straight from the Chrome devtools  right-click the page in Chrome   select Inspect  and then in the DevTools console paste your XPath expression as  x    spam eggs     Get all authors on this page      x       class  user-details   a text

User · Answer

You should try these tools    xmlstarlet   can edit  select  transform    Not installed by default  xpath1 xmllint   often installed by default with libxml2-utils  xpath1  check my wrapper to have --xpath switch on very old releases and newlines delimited output  v  lt  2 9 9  xpath   installed via perl s module XML  XPath  xpath1 xml grep   installed via perl s module XML  Twig  xpath1  limited xpath usage  xidel  xpath3 saxon-lint   my own project  wrapper over  Michael Kay s Saxon-HE Java library  xpath3  xmllint comes with libxml2-utils  can be used as interactive shell with the --shell switch  xmlstarlet is xmlstarlet  xpath comes with perl s module XML  Xpath xml grep comes with perl s module XML  Twig xidel is xidel saxon-lint using SaxonHE 9 6  XPath 3 x   retro compatibility  Ex   xmllint --xpath    element  attribute  file xml xmlstarlet sel -t -v  quot   element  attribute quot  file xml xpath -q -e    element  attribute  file xml xidel -se    element  attribute  file xml saxon-lint --xpath    element  attribute  file xml   xmlstarlet page man xmllint xpath page xml grep xidel saxon-lint

User · Answer

My Python script xgrep py does exactly this  In order to search for all attributes attribute of elements element in files filename xml      you would run it as follows   xgrep py    element  attribute  filename xml       There are various switches for controlling the output  such as -c for counting matches  -i for indenting the matching parts  and -l for outputting filenames only   The script is not available as a Debian or Ubuntu package  but all of its dependencies are

User · Answer

You might also be interested in xsh  It features an interactive mode where you can do whatever you like with the document   open 1 xml   ls   element  id   for   p  class  first   echo text

User · Answer

Similar to Mike s and clacke s answers  here is the python one-liner  using python    2 5  to get the build version from a pom xml file that gets around the fact that pom xml files don t normally have a dtd or default namespace  so don t appear well-formed to libxml   python -c  import xml etree ElementTree as ET      print ET parse open  pom xml    getroot   find       http   maven apache org POM 4 0 0 version   text     Tested on Mac and Linux  and doesn t require any extra packages to be installed

User · Answer

Here s one xmlstarlet use case to extract data from nested elements elem1  elem2 to one line of text from this type of XML  also showing how to handle namespaces     lt  xml version  1 0  encoding  UTF-8  standalone  yes    gt   lt mydoctype xmlns  http   xml-namespace-uri  xmlns xsi  http   www w3 org 2001 XMLSchema-instance  xsi schemaLocation  http   xml-namespace-uri http   xsd-uri  format  20171221A  date  2018-05-15  gt      lt elem1 time  0 586  length  10 586  gt         lt elem2 value  cue-in  type  outro    gt     lt  elem1 gt    lt  mydoctype gt    The output will be  0 586 10 586 cue-in outro   In this snippet  -m matches the nested elem2  -v outputs attribute values  with expressions and relative addressing   -o literal text  -n adds a newline   xml sel -N ns  http   xml-namespace-uri  -t -m    ns elem1 ns elem2     -v     time -o     -v      time       length  -o     -v  value -o     -v  type -n file xml   If more attributes are needed from elem1  one can do it like this  also showing the concat   function    xml sel -N ns  http   xml-namespace-uri  -t -m    ns elem1 ns elem2        -v  concat  time        time    length       ns elem2  value       ns elem2  type   -n file xml   Note the  IMO unnecessary  complication with namespaces  ns  declared with -N   that had me almost giving up on xpath and xmlstarlet  and writing a quick ad-hoc converter

User · Answer

Install the BaseX database  then use it s  quot standalone command-line mode quot  like this  basex -i -   element attribute  lt  filename xml or basex -i filename xml   element attribute The query language is actually XQuery  3 0   not XPath  but since XQuery is a superset of XPath  you can use XPath queries without ever noticing

User · Answer

You can also try my Xidel  It is not in a package in the repository  but you can just download it from the webpage  it has no dependencies    It has simple syntax for this task   xidel filename xml -e    element  attribute     And it is one of the rare of these tools that supports XPath 2

User · Answer

In my search to query maven pom xml files I ran accross this question  However I had the following limitations     must run cross-platform   must exist on all major linux distributions without any additional module installation must handle complex xml-files such as maven pom xml files simple syntax   I have tried many of the above without success    python lxml etree is not part of the standard python distribution xml etree is but does not handle complex maven pom xml files well  have not digged deep enough python xml etree does not handle maven pom xml files for unknown reason xmllint does not work either  core dumps often on ubuntu 12 04  xmllint  using libxml version 20708    The solution that I have come across that is stable  short and work on many platforms and that is mature is the rexml lib builtin in ruby   ruby -r rexml document -e  include REXML        puts XPath first Document new  stdin     project version text       lt  pom xml   What inspired me to find this one was the following articles     Ruby XML  XSLT and XPath Tutorial IBM  Ruby on Rails and XML

User · Answer

I wasn t happy with Python one-liners for HTML XPath queries  so I wrote my own  Assumes that you installed python-lxml package or ran pip install --user lxml   function htmlxpath     python -c  for x in   import    lxml html   html fromstring   import    sys   stdin read    xpath   import    sys   argv 1    print x    1     Once you have it  you can use it like in this example    gt  curl -s https   slashdot org   htmlxpath    title text    Slashdot  News for nerds  stuff that matters

User · Answer

Saxon will do this not only for XPath 2 0  but also for XQuery 1 0 and  in the commercial version  3 0  It doesn t come as a Linux package  but as a jar file  Syntax  which you can easily wrap in a simple script  is  java net sf saxon Query -s source xml -qs   element attribute   2020 UPDATE  Saxon 10 0 includes the Gizmo tool  which can be used interactively or in batch from the command line  For example  java net sf saxon Gizmo -s source xml   gt show   element  attribute   gt quit

User · Answer

Sorry to be yet another voice in the fray  I tried all the tools in this thread and found none of them to be satisfactory for my needs  so I wrote my own  You can find it here  https   github com charmparticle xpe It s been uploaded to pypi  so you can easily install it with pip3 like so  sudo pip3 install xpe  Once installed  you can use it to run xpath expressions against various kinds of input with the same level of flexibility you would get from using xpaths in selenium or javascript  Yeah  you can use xpaths against HTML with this  One caviat  if you run it against xml that has the encoding specified on the first line  it will fail  The solution for now is to pipe it to sed to remove the first line like so  cat specified encoding xml   sed  1d    xpe    text     I may post an update at some point to address this issue to avoid the need for sed

User · Answer

clacke   s answer is great but I think only works if your source is well-formed XML  not normal HTML   So to do the same for normal Web content   HTML docs that aren   t necessarily well-formed XML   echo   lt p gt foo lt div gt bar lt  div gt  lt p gt baz    python -c  from sys import stdin    from lxml import html    print   n  join html tostring node  for node in html parse stdin  xpath    p       And to instead use html5lib  to ensure you get the same parsing behavior as Web browsers   because like browser parsers  html5lib conforms to the parsing requirements in the HTML spec    echo   lt p gt foo lt div gt bar lt  div gt  lt p gt baz    python -c  from sys import stdin    import html5lib  from lxml import html    doc   html5lib parse stdin  treebuilder  lxml   namespaceHTMLElements False     print   n  join html tostring node  for node in doc xpath    p

User · Answer

In addition to XML  XSH and XML  XSH2 there are some grep-like utilities suck as App  xml grep2 and XML  Twig  which includes xml grep rather than xml grep2    These can be quite useful when working on a large or numerous XML files for quick oneliners or Makefile targets   XML  Twig is especially nice to work with for a perl scripting approach when you want to a a bit more processing than your  SHELL and xmllint xstlproc offer   The numbering scheme in the application names indicates that the  2  versions are newer later version of essentially the same tool which may require later versions of other modules  or of perl itself

[xml] How to execute XPath one-liners from shell?

Examples related to xml

Examples related to shell

Examples related to xpath

Examples related to cross-platform