How to find tag with particular text with Beautiful Soup

Question

I have the following html  line breaks marked with  n         lt tr gt     lt td class  pos  gt  n        Some text   n        lt br gt  n        lt strong gt some value lt  strong gt  n    lt  td gt   lt  tr gt   lt tr gt     lt td class  pos  gt  n        Fixed text   n        lt br gt  n        lt strong gt text I am looking for lt  strong gt  n    lt  td gt   lt  tr gt   lt tr gt     lt td class  pos  gt  n        Some other text   n        lt br gt  n        lt strong gt some other value lt  strong gt  n    lt  td gt   lt  tr gt        How to find text I am looking for  The code below returns first found value  so I need to filter by Fixed text somehow   result   soup find  td     class    pos    find  strong   text   Upd  If I use the following code   title   soup find  td   text   re compile ur Fixed text        re DOTALL   attrs     class    pos    self response out write str title string  decode  utf8      then it returns just Fixed text

User · Answer

You could solve this with some simple gazpacho parsing:

from gazpacho import Soup

soup = Soup(html)
tds = soup.find("td", {"class": "pos"})
tds[1].find("strong").text

Which will output:

text I am looking for

User · Answer

This post got me to my answer even though the answer is missing from this post  I felt I should give back   The challenge here is in the inconsistent behavior of BeautifulSoup find when searching with and without text   Note  If you have BeautifulSoup  you can test this locally via   curl https   gist githubusercontent com RichardBronosky 4060082 raw test py   python   Code  https   gist github com 4060082    Taken from https   gist github com 4060082 from BeautifulSoup import BeautifulSoup from urllib2 import urlopen from pprint import pprint import re  soup   BeautifulSoup urlopen  https   gist githubusercontent com RichardBronosky 4060082 raw test html   read      I m going to assume that Peter knew that re compile is meant to cache a computation result for a performance benefit  However  I m going to do that explicitly here to be very clear  pattern   re compile  Fixed text      Peter s suggestion here returns a list of what appear to be strings columns   soup findAll  td   text pattern  attrs   class     pos         but it is actually a BeautifulSoup NavigableString print type columns 0     gt  gt   lt class  BeautifulSoup NavigableString  gt     you can reach the tag using one of the convenience attributes seen here pprint columns 0    dict      gt  gt    next    lt br   gt     gt  gt    nextSibling    lt br   gt     gt  gt    parent    lt td class  pos  gt  n   gt  gt         Fixed text   n   gt  gt         lt br   gt  n   gt  gt         lt strong gt text I am looking for lt  strong gt  n   gt  gt     lt  td gt     gt  gt    previous    lt td class  pos  gt  n   gt  gt         Fixed text   n   gt  gt         lt br   gt  n   gt  gt         lt strong gt text I am looking for lt  strong gt  n   gt  gt     lt  td gt     gt  gt    previousSibling   None     I feel that  parent  is safer to use than  previous  based on http   www crummy com software BeautifulSoup bs4 doc  method-names   So  if you want to find the  text  in the  strong  element    pprint  t parent find  strong   text for t in soup findAll  td   text pattern  attrs   class     pos        gt  gt   u text I am looking for      Here is what we have learned  print soup find  strong     gt  gt   lt strong gt some value lt  strong gt  print soup find  strong   text  some value     gt  gt  u some value  print soup find  strong   text  some value   parent   gt  gt   lt strong gt some value lt  strong gt  print soup find  strong   text  some value      soup find  strong     gt  gt  False print soup find  strong   text  some value      soup find  strong   text   gt  gt  True print soup find  strong   text  some value   parent    soup find  strong     gt  gt  True   Though it is most certainly too late to help the OP  I hope they will make this as the answer since it does satisfy all quandaries around finding by text

User · Answer

A solution for finding a anchor tag if having a particular keyword would be the following   from bs4 import BeautifulSoup from urllib request import urlopen Request from urllib parse import urljoin urlparse  rawLinks soup findAll  a  href True  for link in rawLinks      innercontent link text     if keyword lower   in innercontent lower            print link

User · Answer

result   soup find  strong   text  text I am looking for   text

User · Answer

With bs4 4 7 1  you can use  contains pseudo class to specify the td containing your search string  from bs4 import BeautifulSoup html        lt tr gt     lt td class  pos  gt  n        Some text   n        lt br gt  n        lt strong gt some value lt  strong gt  n    lt  td gt   lt  tr gt   lt tr gt     lt td class  pos  gt  n        Fixed text   n        lt br gt  n        lt strong gt text I am looking for lt  strong gt  n    lt  td gt   lt  tr gt   lt tr gt     lt td class  pos  gt  n        Some other text   n        lt br gt  n        lt strong gt some other value lt  strong gt  n    lt  td gt   lt  tr gt     soup   bs html   lxml   print soup select one  td contains  Fixed text

User · Answer

Since Beautiful Soup 4 4 0  a parameter called string does the work that text used to do in the previous versions   string is for finding strings  you can combine it with arguments that find tags  Beautiful Soup will find all tags whose  string matches your value for the string  This code finds the  tags whose  string is    Elsie      soup find all  td   string  Elsie     For more information about string have a look this section https   www crummy com software BeautifulSoup bs4 doc  the-string-argument

User · Answer

You can pass a regular expression to the text parameter of findAll  like so   import BeautifulSoup import re  columns   soup findAll  td   text   re compile  your regex here    attrs     class     pos

[python] How to find tag with particular text with Beautiful Soup?

Examples related to python

Examples related to html

Examples related to web-scraping

Examples related to beautifulsoup