[python] can we use xpath with BeautifulSoup?

Nope, BeautifulSoup, by itself, does not support XPath expressions.

An alternative library, lxml, does support XPath 1.0. It has a BeautifulSoup compatible mode where it'll try and parse broken HTML the way Soup does. However, the default lxml HTML parser does just as good a job of parsing broken HTML, and I believe is faster.

Once you've parsed your document into an lxml tree, you can use the .xpath() method to search for elements.

try:
    # Python 2
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen
from lxml import etree

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
response = urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
tree.xpath(xpathselector)

There is also a dedicated lxml.html() module with additional functionality.

Note that in the above example I passed the response object directly to lxml, as having the parser read directly from the stream is more efficient than reading the response into a large string first. To do the same with the requests library, you want to set stream=True and pass in the response.raw object after enabling transparent transport decompression:

import lxml.html
import requests

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
response = requests.get(url, stream=True)
response.raw.decode_content = True
tree = lxml.html.parse(response.raw)

Of possible interest to you is the CSS Selector support; the CSSSelector class translates CSS statements into XPath expressions, making your search for td.empformbody that much easier:

from lxml.cssselect import CSSSelector

td_empformbody = CSSSelector('td.empformbody')
for elem in td_empformbody(tree):
    # Do something with these table cells.

Coming full circle: BeautifulSoup itself does have very complete CSS selector support:

for cell in soup.select('table#foobar td.empformbody'):
    # Do something with these table cells.

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to xpath

Xpath: select div that contains class AND whose specific child element contains text XPath: difference between dot and text() How to set "value" to input web element using selenium? How to click a href link using Selenium XPath: Get parent node from child node What is the difference between absolute and relative xpaths? Which is preferred in Selenium automation testing? How to use XPath preceding-sibling correctly Selenium and xPath - locating a link by containing text How to verify an XPath expression in Chrome Developers tool or Firefox's Firebug? Concatenate multiple node values in xpath

Examples related to beautifulsoup

Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org What should I use to open a url instead of urlopen in urllib3 TypeError: a bytes-like object is required, not 'str' in python and CSV UnicodeEncodeError: 'ascii' codec can't encode character at special name UnicodeEncodeError: 'charmap' codec can't encode characters bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library? Using BeautifulSoup to extract text without tags python BeautifulSoup parsing table install beautiful soup using pip Python BeautifulSoup extract text between element

Examples related to urllib

installing urllib in Python3.6 SSL: CERTIFICATE_VERIFY_FAILED with Python3 Python: Importing urllib.quote Python 2.7.10 error "from urllib.request import urlopen" no module named request python save image from url no module named urllib.parse (How should I install it?) 'module' has no attribute 'urlencode' urllib and "SSL: CERTIFICATE_VERIFY_FAILED" Error UnicodeEncodeError: 'charmap' codec can't encode characters replace special characters in a string python