python BeautifulSoup parsing table

Question

I m learning python requests and BeautifulSoup   For an exercise  I ve chosen to write a quick NYC parking ticket parser   I am able to get an html response which is quite ugly   I need to grab the lineItemsTable and parse all the tickets   You can reproduce the page by going here  https   paydirect link2gov com NYCParking-Plate ItemSearch and entering a NY plate T630134C  soup   BeautifulSoup plateRequest text   print soup prettify     print soup find all  tr    table   soup find  table      class     lineItemsTable     for row in table findAll  tr        cells   row findAll  td       print cells   Can someone please help me out   Simple looking for all tr does not get me anywhere

User · Answer

Updated Answer

If a programmer is interested in only parsing a table from a webpage, they can utilize the pandas method pandas.read_html.

Let's say we want to extract the GDP data table from the website: https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries

Then following codes does the job perfectly (No need of beautifulsoup and fancy html):

import pandas as pd
import requests

url = "https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries"

r = requests.get(url)
df_list = pd.read_html(r.text) # this parses all the tables in webpages to a list
df = df_list[0]
df.head()

Output

User · Answer

Here is working example for a generic  lt table gt    question links-broken   Extracting the table from here countries by GDP  Gross Domestic Product    htmltable   soup find  table      class     table table-striped       where the dictionary specify unique attributes for the  table  tag   The tableDataText function parses a html segment started with tag  lt table gt          followed by multiple  lt tr gt   table rows  and inner  lt td gt   table data  tags  It returns a list of rows with inner columns  Accepts only one  lt th gt   table header data  in the first row   def tableDataText table              rows          trs   table find all  tr       headerow    td get text strip True  for td in trs 0  find all  th      header row     if headerow    if there is a header row include first         rows append headerow          trs   trs 1       for tr in trs    for every table row         rows append  td get text strip True  for td in tr find all  td       data row     return rows   Using it we get  first two rows    list table   tableDataText htmltable  list table  2      Rank      Name      GDP  IMF  19       GDP  UN  16       GDP Per Capita      2019 Population       1      United States      21 41 trillion      18 62 trillion       65 064      329 064 917      That can be easily transformed in a pandas DataFrame for more advanced tools    import pandas as pd dftable   pd DataFrame list table 1    columns list table 0   dftable head 4

User · Answer

Solved  this is how your parse their html results   table   soup find  table      class     lineItemsTable     for row in table findAll  tr        cells   row findAll  td       if len cells     9          summons   cells 1  find text True          plateType   cells 2  find text True          vDate   cells 3  find text True          location   cells 4  find text True          borough   cells 5  find text True          vCode   cells 6  find text True          amount   cells 7  find text True          print amount

User · Answer

from behave import   from selenium import webdriver from selenium webdriver common by import By from selenium webdriver support wait import WebDriverWait from selenium webdriver support import expected conditions as ec import pandas as pd import requests from bs4 import BeautifulSoup from tabulate import tabulate  class readTableDataFromDB       def LookupValueFromColumnSingleKey context  tablexpath  rowName  columnName           print  quot element present readData From Table quot           element   context driver find elements by xpath tablexpath  quot  descendant  th quot           indexrow   1         indexcolumn   1         for values in element              valuepresent   values text             print  quot text present here   quot  valuepresent  quot rowName   quot  rowName              if valuepresent find columnName     -1                   print  quot current row quot  str indexrow    quot value quot  valuepresent                   break             else                   indexrow   indexrow 1              indexvalue   context driver find elements by xpath              tablexpath  quot  descendant  tr td 1  quot           for valuescolumn in indexvalue              valuepresentcolumn   valuescolumn text             print  quot Team text present here   quot                      valuepresentcolumn  quot columnName   quot  rowName              print indexcolumn               if valuepresentcolumn find rowName     -1                  print  quot current column quot  str indexcolumn                           quot value quot  valuepresentcolumn                  break             else                  indexcolumn   indexcolumn 1          print  quot index column quot  str indexcolumn           print tablexpath   quot   descendant  tr  quot  str indexcolumn   quot   td  quot  str indexrow   quot   quot            lookupelement   context driver find element by xpath tablexpath   quot   descendant  tr  quot  str indexcolumn   quot   td  quot  str indexrow   quot   quot            print lookupelement text          return context driver find elements by xpath tablexpath  quot   descendant  tr  quot  str indexcolumn   quot   td  quot  str indexrow   quot   quot        def LookupValueFromColumnTwoKeyssss context  tablexpath  rowName  columnName  columnName1           print  quot element present readData From Table quot           element   context driver find elements by xpath              tablexpath  quot  descendant  th quot           indexrow   1         indexcolumn   1         indexcolumn1   1         for values in element              valuepresent   values text             print  quot text present here   quot  valuepresent              indexrow   indexrow 1             if valuepresent    columnName                  print  quot current row value quot  str indexrow   quot value quot  valuepresent                  break          for values in element              valuepresent   values text             print  quot text present here   quot  valuepresent              indexrow   indexrow 1             if valuepresent find columnName1     -1                  print  quot current row value quot  str indexrow   quot value quot  valuepresent                  break          indexvalue   context driver find elements by xpath              tablexpath  quot  descendant  tr td 1  quot           for valuescolumn in indexvalue              valuepresentcolumn   valuescolumn text             print  quot Team text present here   quot  valuepresentcolumn              print indexcolumn              indexcolumn   indexcolumn 1             if valuepresent find rowName     -1                  print  quot current column quot  str indexcolumn                           quot value quot  valuepresentcolumn                  break         print  quot indexrow quot  str indexrow           print  quot index column quot  str indexcolumn           lookupelement   context driver find element by xpath              tablexpath  quot   descendant  tr  quot  str indexcolumn   quot   td  quot  str indexrow   quot   quot           print tablexpath                  quot   descendant  tr  quot  str indexcolumn   quot   td  quot  str indexrow   quot   quot           print lookupelement text          return context driver find element by xpath tablexpath  quot   descendant  tr  quot  str indexrow   quot   td  quot  str indexcolumn   quot   quot

User · Answer

Here you go   data      table   soup find  table   attrs   class   lineItemsTable    table body   table find  tbody    rows   table body find all  tr   for row in rows      cols   row find all  td       cols    ele text strip   for ele in cols      data append  ele for ele in cols if ele     Get rid of empty values   This gives you      u 1359711259   u SRF   u 08 05 2013   u 5310 4 AVE   u K   u 19   u 125 00   u          u 7086775850   u PAS   u 12 14 2013   u 3908 6th Ave   u K   u 40   u 125 00   u          u 7355010165   u OMT   u 12 14 2013   u 3908 6th Ave   u K   u 40   u 145 00   u          u 4002488755   u OMT   u 02 12 2014   u NB 1ST AVE   E 23RD ST   u 5   u 115 00   u          u 7913806837   u OMT   u 03 03 2014   u 5015 4th Ave   u K   u 46   u 115 00   u          u 5080015366   u OMT   u 03 10 2014   u EB 65TH ST   16TH AV E   u 7   u 50 00   u          u 7208770670   u OMT   u 04 08 2014   u 333 15th St   u K   u 70   u 65 00   u          u  0 00 n n nPayment Amount        Couple of things to note      The last row in the output above   the Payment Amount is not a part of the table but that is how the table is laid out  You can filter it out by checking if the length of the list is less than 7  The last column of every row will have to be handled separately since it is an input text box

[python] python BeautifulSoup parsing table

Updated Answer

Output

Examples related to python

Examples related to beautifulsoup