I wasn't happy with Python one-liners for HTML XPath queries, so I wrote my own. Assumes that you installed python-lxml
package or ran pip install --user lxml
:
function htmlxpath() { python -c 'for x in __import__("lxml.html").html.fromstring(__import__("sys").stdin.read()).xpath(__import__("sys").argv[1]): print(x)' $1 }
Once you have it, you can use it like in this example:
> curl -s https://slashdot.org | htmlxpath '//title/text()'
Slashdot: News for nerds, stuff that matters