How can I use the python HTMLParser library to extract data from a specific div tag

Question

I am trying to get a value out of a HTML page using the python HTMLParser library  The value I want to get hold of is within this html element        lt div id  remository  gt 20 lt  div gt        This is my HTMLParser class so far   class LinksParser HTMLParser HTMLParser     def   init   self       HTMLParser HTMLParser   init   self      self seen         def handle starttag self  tag  attributes       if tag     div   return     for name  value in attributes      if name     id  and value     remository          print value       return    def handle data self  data       print data   p   LinksParser   f   urllib urlopen  http   domain com somepage html   html   f read   p feed html  p close     Can someone point me in the right direction  I want the class functionality to get the value 20

User · Answer

Have You tried BeautifulSoup     from bs4 import BeautifulSoup soup   BeautifulSoup   lt div id  remository  gt 20 lt  div gt    tag soup div print tag string    This gives You 20 on output

User · Answer

Little correction at Line 3  HTMLParser HTMLParser   init   self   it should be   HTMLParser   init   self   The following worked for me though  import urllib2   from HTMLParser import HTMLParser    class MyHTMLParser HTMLParser      def   init   self       HTMLParser   init   self      self recording   0      self data        def handle starttag self  tag  attrs       if tag     required tag         for name  value in attrs          if name     somename  and value     somevale             print name  value           print  Encountered the beginning of a  s tag    tag            self recording   1      def handle endtag self  tag       if tag     required tag         self recording - 1        print  Encountered the end of a  s tag    tag     def handle data self  data       if self recording        self data append data    p   MyHTMLParser    f   urllib2 urlopen  http   www someurl com    html   f read    p feed html   print p data  p close

User · Answer

class LinksParser HTMLParser HTMLParser     def   init   self       HTMLParser HTMLParser   init   self      self recording   0     self data         def handle starttag self  tag  attributes       if tag     div         return     if self recording        self recording    1       return     for name  value in attributes        if name     id  and value     remository           break     else        return     self recording   1    def handle endtag self  tag       if tag     div  and self recording        self recording -  1    def handle data self  data       if self recording        self data append data    self recording counts the number of nested div tags starting from a  triggering  one   When we re in the sub-tree rooted in a triggering tag  we accumulate the data in self data   The data at the end of the parse are left in self data  a list of strings  possibly empty if no triggering tag was met    Your code from outside the class can access the list directly from the instance at the end of the parse  or you can add appropriate accessor methods for the purpose  depending on what exactly is your goal   The class could be easily made a bit more general by using  in lieu of the constant literal strings seen in the code above   div    id   and  remository   instance attributes self tag  self attname and self attvalue  set by   init   from arguments passed to it -- I avoided that cheap generalization step in the code above to avoid obscuring the core points  keep track of a count of nested tags and accumulate data into a list when the recording state is active

User · Answer

This works perfectly   print  soup find  the tag   text

[python] How can I use the python HTMLParser library to extract data from a specific div tag?

Examples related to python

Examples related to html

Examples related to parsing

Examples related to html-parsing