BeautifulSoup extract text from anchor tag

Question

I want to extract    text from following src of the image tag and text of the anchor tag which is inside the div class data   I successfully manage to extract the img src  but am having trouble extracting the text from the anchor tag    lt a class  title  href  http   www amazon com Nikon-COOLPIX-Digital-Camera-NIKKOR dp B0073HSK0K ref sr 1 1 s electronics amp amp ie UTF8 amp amp qid 1343628292 amp amp sr 1-1 amp amp keywords digital camera  gt Nikon COOLPIX L26 16 1 MP Digital Camera with 5x Zoom NIKKOR Glass Lens and 3-inch LCD  Red  lt  a gt     Here is the link for the entire HTML page   Here is my code   for div in soup findAll  div   attrs   class   image         print   n      for data in div findNextSibling  div   attrs   class   data             for a in data findAll  a   attrs   class   title                 print a text     for img in div findAll  img            print img  src     What I am trying to do is extract the image src  link  and the title inside the div class data  so for example     lt a class  title  href  http   www amazon com Nikon-COOLPIX-Digital-Camera-NIKKOR dp B0073HSK0K ref sr 1 1 s electronics amp amp ie UTF8 amp amp qid 1343628292 amp amp sr 1-1 amp amp keywords digital camera  gt Nikon COOLPIX L26 16 1 MP Digital Camera with 5x Zoom NIKKOR Glass Lens and 3-inch LCD  Red  lt  a gt     should extract      Nikon COOLPIX L26 16 1 MP Digital Camera with 5x Zoom NIKKOR Glass Lens and 3-inch LCD  Red

User · Answer

print link addres contents 0    It will print the context of the anchor tags example   statement title   statement find  h2  class   briefing-statement  title    statement title text   statement title a contents 0

User · Answer

All the above answers really help me to construct my answer  because of this I voted for all the answers that other users put it out  But I finally put together my own answer to exact problem I was dealing with   As question clearly defined I had to access some of the siblings and its children in a dom structure  This solution will iterate over the images in the dom structure and construct image name using product title and save the image to the local directory    import urlparse from urllib2 import urlopen from urllib import urlretrieve from BeautifulSoup import BeautifulSoup as bs import requests  def getImages url        Download the images     r   requests get url      html   r text     soup   bs html      output folder      amazon       extracting the images that in div s      for div in soup findAll  div   attrs   class   image             modified file name   None         try               getting the data div using findNext             nextDiv    div findNext  div   attrs   class   data                 use findNext again on previous object to get to the anchor tag             fileName   nextDiv findNext  a   text             modified file name   fileName replace      -       jpg          except TypeError              print  skip          imageUrl   div find  img    src           outputPath   os path join output folder  modified file name          urlretrieve imageUrl  outputPath   if   name       main         url   r http   www amazon com s ref sr pg 1 rh n 3A172282 2Ck 3Adigital camera amp keywords digital camera amp ie UTF8 amp qid 1343600585      getImages url

User · Answer

This will help   from bs4 import BeautifulSoup  data       lt div class  image  gt           lt a href  http   www example com eg1  gt Content1 lt img           src  http   image example com img1 jpg    gt  lt  a gt           lt  div gt           lt div class  image  gt           lt a href  http   www example com eg2  gt Content2 lt img           src  http   image example com img2 jpg    gt   lt  a gt           lt  div gt      soup   BeautifulSoup data   for div in soup findAll  div   attrs   class   image         print div find  a    href        print div find  a   contents 0       print div find  img    src      If you are looking into Amazon products then you should be using the official API  There is at least one Python package that will ease your scraping issues and keep your activity within the terms of use

User · Answer

I would suggest going the lxml route and using xpath   from lxml import etree   data is the variable containing the html data   etree HTML data  anchor   data xpath    a  class  title   text

User · Answer

gt  gt  gt  txt     lt a class  title  href  http   rads stackoverflow com amzn click B0073HSK0K  gt Nikon COOLPIX L26 16 1 MP Digital Camera with 5x Zoom NIKKOR Glass Lens and 3-inch LCD  Red  lt  a gt     gt  gt  gt  fragment   bs4 BeautifulSoup txt   gt  gt  gt  fragment  lt a class  title  href  http   rads stackoverflow com amzn click B0073HSK0K  gt Nikon COOLPIX L26 16 1 MP Digital Camera with 5x Zoom NIKKOR Glass Lens and 3-inch LCD  Red  lt  a gt    gt  gt  gt  fragment find  a     class    title     lt a class  title  href  http   rads stackoverflow com amzn click B0073HSK0K  gt Nikon COOLPIX L26 16 1 MP Digital Camera with 5x Zoom NIKKOR Glass Lens and 3-inch LCD  Red  lt  a gt   gt  gt  gt  fragment find  a     class    title    string u Nikon COOLPIX L26 16 1 MP Digital Camera with 5x Zoom NIKKOR Glass Lens and 3-inch LCD  Red

User · Answer

In my case  it worked like that   from BeautifulSoup import BeautifulSoup as bs  url  http   blabla com   soup   bs urllib urlopen url   for link in soup findAll  a            print link string   Hope it helps

[python] BeautifulSoup: extract text from anchor tag

Examples related to python

Examples related to html

Examples related to beautifulsoup

Examples related to tags

Examples related to scraper