Using BeautifulSoup to search HTML for string

Question

I am using BeautifulSoup to look for user-entered strings on a specific page  For example  I want to see if the string  Python  is located on the page  http   python org When I used  find string   soup body findAll text  Python    find string returned    But when I used  find string   soup body findAll text re compile  Python    limit 1   find string returned  u Python Jobs   as expected What is the difference between these two statements that makes the second statement work when there are more than one instances of the word to be searched

User · Accepted Answer

The following line is looking for the exact NavigableString  Python     gt  gt  gt  soup body findAll text  Python        Note that the following NavigableString is found    gt  gt  gt  soup body findAll text  Python Jobs     u Python Jobs     Note this behaviour    gt  gt  gt  import re  gt  gt  gt  soup body findAll text re compile   Python          So your regexp is looking for an occurrence of  Python  not the exact match to the NavigableString  Python

User · Answer

In addition to the accepted answer  You can use a lambda instead of regex  from bs4 import BeautifulSoup  html    quot  quot  quot  lt p gt test python lt  p gt  quot  quot  quot   soup   BeautifulSoup html   quot html parser quot    print soup text  quot python quot    print soup text lambda t   quot python quot  in t    Output       test python

User · Answer

I have not used BeuatifulSoup but maybe the following can help in some tiny way   import re import urllib2 stuff   urllib2 urlopen your url goes here  read      stuff will contain the  entire  page    Replace the string Python with your desired regex results   re findall   Python   stuff   for i in results      print i   I m not suggesting this is a replacement but maybe you can glean some value in the concept until a direct answer comes along

User · Answer

text  Python  searches for elements that have the exact text you provided   import re from BeautifulSoup import BeautifulSoup  html       lt p gt exact text lt  p gt      lt p gt almost exact text lt  p gt     soup   BeautifulSoup html  print soup text  exact text   print soup text re compile  exact text      Output   u exact text    u exact text   u almost exact text      To see if the string  Python  is located on the page http   python org    import urllib2 html   urllib2 urlopen  http   python org   read   print  Python  in html   - gt  True   If you need to find a position of substring within a string you could do html find  Python

[python] Using BeautifulSoup to search HTML for string

Examples related to python

Examples related to beautifulsoup