[python] Python Selenium accessing HTML source

How can I get the HTML source in a variable using the Selenium module with Python?

I wanted to do something like this:

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")
if "whatever" in html_source:
    # Do something
else:
    # Do something else

How can I do this? I don't know how to access the HTML source.

This question is related to python selenium selenium-webdriver

The answer is


from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
html_source_code = driver.execute_script("return document.body.innerHTML;")
html_soup: BeautifulSoup = BeautifulSoup(html_source_code, 'html.parser')

Now you can apply BeautifulSoup function to extract data...


driver.page_source will help you get the page source code. You can check if the text is present in the page source or not.

from selenium import webdriver
driver = webdriver.Firefox()
driver.get("some url")
if "your text here" in driver.page_source:
    print('Found it!')
else:
    print('Did not find it.')

If you want to store the page source in a variable, add below line after driver.get:

var_pgsource=driver.page_source

and change the if condition to:

if "your text here" in var_pgsource:

By using the page source you will get the whole HTML code.
So first decide the block of code or tag in which you require to retrieve the data or to click the element..

options = driver.find_elements_by_name_("XXX")
for option in options:
    if option.text == "XXXXXX":
        print(option.text)
        option.click()

You can find the elements by name, XPath, id, link and CSS path.


You need to access the page_source property:

from selenium import webdriver

browser = webdriver.Firefox()
browser.get("http://example.com")

html_source = browser.page_source
if "whatever" in html_source:
    # do something
else:
    # do something else

With Selenium2Library you can use get_source()

import Selenium2Library
s = Selenium2Library.Selenium2Library()
s.open_browser("localhost:7080", "firefox")
source = s.get_source()

I'd recommend getting the source with urllib and, if you're going to parse, use something like Beautiful Soup.

import urllib

url = urllib.urlopen("http://example.com") # Open the URL.
content = url.readlines() # Read the source and save it to a variable.

To answer your question about getting the URL to use for urllib, just execute this JavaScript code:

url = browser.execute_script("return window.location;")

You can simply use the WebDriver object, and access to the page source code via its @property field page_source...

Try this code snippet :-)

from selenium import webdriver
driver = webdriver.Firefox('path/to/executable')
driver.get('https://some-domain.com')
source = driver.page_source
if 'stuff' in source:
    print('found...')
else:
    print('not in source...')

Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to selenium

SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 81 session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium Selenium: WebDriverException:Chrome failed to start: crashed as google-chrome is no longer running so ChromeDriver is assuming that Chrome has crashed WebDriverException: unknown error: DevToolsActivePort file doesn't exist while trying to initiate Chrome Browser Class has been compiled by a more recent version of the Java Environment How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium? How to make Firefox headless programmatically in Selenium with Python? element not interactable exception in selenium web automation Selenium Web Driver & Java. Element is not clickable at point (x, y). Other element would receive the click How do you fix the "element not interactable" exception?

Examples related to selenium-webdriver

SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 81 Selenium: WebDriverException:Chrome failed to start: crashed as google-chrome is no longer running so ChromeDriver is assuming that Chrome has crashed WebDriverException: unknown error: DevToolsActivePort file doesn't exist while trying to initiate Chrome Browser How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium? How to make Firefox headless programmatically in Selenium with Python? Selenium Web Driver & Java. Element is not clickable at point (x, y). Other element would receive the click How do you fix the "element not interactable" exception? Scrolling to element using webdriver? Only local connections are allowed Chrome and Selenium webdriver Check if element is clickable in Selenium Java