Wait until page is loaded with Selenium WebDriver for Python

Question

I want to scrape all the data of a page implemented by a infinite scroll  The following python code works   for i in range 100       driver execute script  window scrollTo 0  document body scrollHeight         time sleep 5    This means every time I scroll down to the bottom  I need to wait 5 seconds  which is generally enough for the page to finish loading the newly generated contents  But  this may not be time efficient  The page may finish loading the new contents within 5 seconds  How can I detect whether the page finished loading the new contents every time I scroll down  If I can detect this  I can scroll down again to see more contents once I know the page finished loading  This is more time efficient

User · Answer

Solution for ajax pages that continuously load data  The previews methods stated do not work  What we can do instead is grab the page dom and hash it and compare old and new hash values together over a delta time  import time from selenium import webdriver  def page has loaded driver  sleep time   2               Waits for page to completely load by comparing current page hash values               def get page hash driver                       Returns html dom hash                       can find element by either  html  tag or by the html  root  id         dom   driver find element by tag name  html   get attribute  innerHTML             dom   driver find element by id  root   get attribute  innerHTML           dom hash   hash dom encode  utf-8            return dom hash      page hash    empty      page hash new                 comparing old and new page DOM hash together to verify the page is fully loaded     while page hash    page hash new           page hash   get page hash driver          time sleep sleep time          page hash new   get page hash driver          print   lt page has loaded gt  - page not loaded        print   lt page has loaded gt  - page loaded      format driver current url

User · Answer

use this in code   from selenium import webdriver  driver   webdriver Firefox     or Chrome   driver implicitly wait 10    seconds driver get  quot http   www        quot    or you can use this code if you are looking for a specific tag    from selenium import webdriver from selenium webdriver common by import By from selenium webdriver support ui import WebDriverWait from selenium webdriver support import expected conditions as EC  driver   webdriver Firefox    or Chrome   driver get  quot http   www        quot   try      element   WebDriverWait driver  10  until          EC presence of element located  By ID   quot tag id quot          finally      driver quit

User · Answer

How about putting WebDriverWait in While loop and catching the exceptions   from selenium import webdriver from selenium webdriver support ui import WebDriverWait from selenium webdriver support import expected conditions as EC from selenium common exceptions import TimeoutException  browser   webdriver Firefox   browser get  url   delay   3   seconds while True      try          WebDriverWait browser  delay  until EC presence of element located browser find element by id  IdOfMyElement             print  Page is ready           break   it will break from the loop once the specific element will be present       except TimeoutException          print  Loading took too much time -Try again

User · Answer

Find below 3 methods   readyState  Checking page readyState  not reliable    def page has loaded self       self log info  Checking if    page is loaded   format self driver current url       page state   self driver execute script  return document readyState        return page state     complete       The wait for helper function is good  but unfortunately click through to new page is open to the race condition where we manage to execute the script in the old page  before the browser has started processing the click  and page has loaded just returns true straight away    id  Comparing new page ids with the old one   def page has loaded id self       self log info  Checking if    page is loaded   format self driver current url       try          new page   browser find element by tag name  html           return new page id    old page id     except NoSuchElementException          return False      It s possible that comparing ids is not as effective as waiting for stale reference exceptions    staleness of  Using staleness of method    contextlib contextmanager def wait for page load self  timeout 10       self log debug  Waiting for page to load at      format self driver current url       old page   self find element by tag name  html       yield     WebDriverWait self  timeout  until staleness of old page       For more details  check Harry s blog

User · Answer

On a side note  instead of scrolling down 100 times  you can check if there are no more modifications to the DOM  we are in the case of the bottom of the page being AJAX lazy-loaded   def scrollDown driver  value       driver execute script  window scrollBy 0   str value          Scroll down the page def scrollDownAllTheWay driver       old page   driver page source     while True          logging debug  Scrolling loop           for i in range 2               scrollDown driver  500              time sleep 2          new page   driver page source         if new page    old page              old page   new page         else              break     return True

User · Answer

Have you tried driver implicitly wait  It is like a setting for the driver  so you only call it once in the session and it basically tells the driver to wait the given amount of time until each command can be executed    driver   webdriver Chrome   driver implicitly wait 10    So if you set a wait time of 10 seconds it will execute the command as soon as possible  waiting 10 seconds before it gives up  I ve used this in similar scroll-down scenarios so I don t see why it wouldn t work in your case  Hope this is helpful   To be able to fix this answer  I have to add new text  Be sure to use a lower case  w  in implicitly wait

User · Answer

The webdriver will wait for a page to load by default via  get   method   As you may be looking for some specific element as  user227215 said  you should use WebDriverWait to wait for an element located in your page   from selenium import webdriver from selenium webdriver support ui import WebDriverWait from selenium webdriver support import expected conditions as EC from selenium webdriver common by import By from selenium common exceptions import TimeoutException  browser   webdriver Firefox   browser get  url   delay   3   seconds try      myElem   WebDriverWait browser  delay  until EC presence of element located  By ID   IdOfMyElement         print  Page is ready   except TimeoutException      print  Loading took too much time     I have used it for checking alerts  You can use any other type methods to find the locator    EDIT 1   I should mention that the webdriver will wait for a page to load by default  It does not wait for loading inside frames or for ajax requests  It means when you use  get  url    your browser will wait until the page is completely loaded and then go to the next command in the code  But when you are posting an ajax request  webdriver does not wait and it s your responsibility to wait an appropriate amount of time for the page or a part of page to load  so there is a module named expected conditions

User · Answer

You can do that very simple by this function  def page is loading driver       while True          x   driver execute script  quot return document readyState quot           if x     quot complete quot               return True         else              yield False  and when you want do something after page loading complete you can use  Driver   webdriver Firefox options Options  executable path  geckodriver exe   Driver get  quot https   www google com  quot    while not page is loading Driver       continue  Driver execute script  quot alert  page is loaded   quot

User · Answer

Here I did it using a rather simple form    from selenium import webdriver browser   webdriver Firefox   browser get  url   searchTxt    while not searchTxt      try            searchTxt browser find element by name  NAME OF ELEMENT         searchTxt send keys  USERNAME       except continue

User · Answer

As mentioned in the answer from David Cullen  I ve always seen recommendations to use a line like the following one   element present   EC presence of element located  By ID   element id    WebDriverWait driver  timeout  until element present    It was difficult for me to find somewhere all the possible locators that can be used with the By  so I thought it would be useful to provide the list here  According to Web Scraping with Python by Ryan Mitchell      ID      Used in the example  finds elements by their HTML id attribute      CLASS NAME      Used to find elements by their HTML class attribute  Why is this   function CLASS NAME not simply CLASS  Using the form object CLASS   would create problems for Selenium s Java library  where  class is a   reserved method  In order to keep the Selenium syntax consistent   between different languages  CLASS NAME was used instead       CSS SELECTOR      Finds elements by their class  id  or tag name  using the  idName     className  tagName convention       LINK TEXT      Finds HTML  tags by the text they contain  For example  a link that   says  Next  can be selected using  By LINK TEXT   Next         PARTIAL LINK TEXT      Similar to LINK TEXT  but matches on a partial string       NAME      Finds HTML tags by their name attribute  This is handy for HTML forms       TAG NAME      Finds HTML tags by their tag name       XPATH      Uses an XPath expression     to select matching elements

User · Answer

From selenium webdriver support wait py  driver       from selenium webdriver support wait import WebDriverWait element   WebDriverWait driver  10  until      lambda x  x find element by id  someId

User · Answer

Trying to pass find element by id to the constructor for presence of element located  as shown in the accepted answer  caused NoSuchElementException to be raised  I had to use the syntax in fragles  comment   from selenium import webdriver from selenium common exceptions import TimeoutException from selenium webdriver support ui import WebDriverWait from selenium webdriver support import expected conditions as EC from selenium webdriver common by import By  driver   webdriver Firefox   driver get  url   timeout   5 try      element present   EC presence of element located  By ID   element id        WebDriverWait driver  timeout  until element present  except TimeoutException      print  Timed out waiting for page to load    This matches the example in the documentation  Here is a link to the documentation for By

User · Answer

Very good answers here  Quick example of wait for XPATH    wait for sizes to load - 2s timeout try      WebDriverWait driver  2  until expected conditions presence of element located           By XPATH   quot   div  id  stockSizes    a quot     except TimeoutException      pass

[python] Wait until page is loaded with Selenium WebDriver for Python

Examples related to python

Examples related to selenium

Examples related to execute-script