Extracting an attribute value with beautifulsoup

Question

I am trying to extract the content of a single  value  attribute in a specific  input  tag on a webpage  I use the following code   import urllib f   urllib urlopen  http   58 68 130 147   s   f read   f close    from BeautifulSoup import BeautifulStoneSoup soup   BeautifulStoneSoup s   inputTag   soup findAll attrs   name     stainfo     output   inputTag  value    print str output    I get a TypeError  list indices must be integers  not str  even though from the Beautifulsoup documentation i understand that strings should not be a problem here    but i a no specialist and i may have misunderstood    Any suggestion is greatly appreciated  Thanks in advance

User · Answer

I would actually suggest you a time saving way to go with this assuming that you know what kind of tags have those attributes.

suppose say a tag xyz has that attritube named "staininfo"..

full_tag = soup.findAll("xyz")

And i wan't you to understand that full_tag is a list

for each_tag in full_tag:
    staininfo_attrb_value = each_tag["staininfo"]
    print staininfo_attrb_value

Thus you can get all the attrb values of staininfo for all the tags xyz

User · Answer

For me   lt input id  quot color quot  value  quot Blue quot   gt   This can be fetched by below snippet  page   requests get  quot https   www abcd com quot   soup   BeautifulSoup page content   html parser   colorName   soup find id  color   print color  value

User · Answer

find all   returns list of all found elements  so  input tag   soup find all attrs   quot name quot     quot stainfo quot     input tag is a list  probably containing only one element   Depending on what you want exactly you either should do  output   input tag 0   value    or use  find   method which returns only one  first  found element  input tag   soup find attrs   quot name quot    quot stainfo quot    output   input tag  value

User · Answer

I am using this with Beautifulsoup 4 8 1 to get the value of all class attributes of  certain elements   from bs4 import BeautifulSoup  html     lt td class  val1   gt  lt td col  1   gt  lt td class  val2    gt    bsoup   BeautifulSoup html   html parser    for td in bsoup find all  td        if td has attr  class            print td  class   0     Its important to note that the attribute key retrieves a list even when the attribute has only a single value

User · Answer

In Python 3 x  simply use get attr name  on your tag object that you get using find all   xmlData   None  with open  conf  test1 xml    r   as xmlFile      xmlData   xmlFile read    xmlDecoded   xmlData  xmlSoup   BeautifulSoup xmlData   html parser    repElemList   xmlSoup find all  repeatingelement    for repElem in repElemList      print  Processing repElem          repElemID   repElem get  id       repElemName   repElem get  name        print  Attribute id    s    repElemID      print  Attribute name    s    repElemName    against XML file conf  test1 xml that looks like    lt  xml version  1 0  encoding  UTF-8  standalone  yes   gt   lt root gt       lt singleElement gt           lt subElementX gt XYZ lt  subElementX gt       lt  singleElement gt       lt repeatingElement id  11  name  Joe   gt       lt repeatingElement id  12  name  Mary   gt   lt  root gt    prints   Processing repElem    Attribute id   11 Attribute name   Joe Processing repElem    Attribute id   12 Attribute name   Mary

User · Answer

you can also use this    import requests from bs4 import BeautifulSoup import csv  url    http   58 68 130 147   r   requests get url  data   r text  soup   BeautifulSoup data   html parser   get details   soup find all  input   attrs   name   stainfo     for val in get details      get val   val  value       print get val

User · Answer

You could try to use the new powerful package called requests html  from requests html import HTMLSession session   HTMLSession    r   session get  quot https   www bbc co uk news technology-54448223 quot   date   r html find  time   first   True    finding a  quot tag quot  called  quot time quot  print date     you will have   lt Element  time  datetime  2020-10-07T11 41 22 000Z  gt    To get the text inside the  quot datetime quot  attribute use  print date attrs  datetime      you will get  2020-10-07T11 41 22 000Z

User · Answer

Here is an example for how to extract the href attrbiutes of all a tags  import requests as rq  from bs4 import BeautifulSoup as bs  url    quot http   www cde ca gov ds sp ai  quot  page   rq get url  html   bs page text   lxml    hrefs   html find all  quot a quot   all hrefs      for href in hrefs        print href get  quot href quot        links   href get  quot href quot       all hrefs append links   print all hrefs

User · Answer

You can try gazpacho  Install it using pip install gazpacho Get the HTML and make the Soup using  from gazpacho import get  Soup  soup   Soup get  quot http   ip add ress here  quot       get directly returns the html  inputs   soup find  input   attrs   name    stainfo       Find all the input tags  if inputs      if type inputs  is list          for input in inputs               print input attr get  value        else           print inputs attr get  value    else       print  No  lt input gt  tag found with the attribute name  quot stainfo quot

User · Answer

If you want to retrieve multiple values of attributes from the source above  you can use findAll and a list comprehension to get everything you need   import urllib f   urllib urlopen  http   58 68 130 147   s   f read   f close    from BeautifulSoup import BeautifulStoneSoup soup   BeautifulStoneSoup s   inputTags   soup findAll attrs   name     stainfo        You may be able to do findAll  input   attrs   name     stainfo     output    x  stainfo   for x in inputTags   print output     This will print a list of the values

[python] Extracting an attribute value with beautifulsoup

Examples related to python

Examples related to parsing

Examples related to attributes

Examples related to beautifulsoup