A lot have been analyzed and discussed about a website being detected being driven by Selenium controlled ChromeDriver. Here are my two cents:
According to the article Browser detection using the user agent serving different webpages or services to different browsers is usually not among the best of ideas. The web is meant to be accessible to everyone, regardless of which browser or device an user is using. There are best practices outlined to develop a website to progressively enhance itself based on the feature availability rather than by targeting specific browsers.
However, browsers and standards are not perfect, and there are still some edge cases where some websites still detects the browser and if the browser is driven by Selenium controled WebDriver. Browsers can be detected through different ways and some commonly used mechanisms are as follows:
You can find a relevant detailed discussion in How does recaptcha 3 know I'm using selenium/chromedriver?
You can find a relevant detailed discussion in Access Denied page with headless Chrome on Linux while headed Chrome works on windows using Selenium through Python
You can find a relevant detailed discussion in Unable to use Selenium to automate Chase site login
You can find a relevant detailed discussion in Dynamic dropdown doesn't populate with auto suggestions on https://www.nseindia.com/ when values are passed using Selenium and Python
You can find a relevant detailed discussion in Website using DataDome gets captcha blocked while scraping using Selenium and Python
However, using the user-agent to detect the browser looks simple but doing it well is in fact a bit tougher.
Note: At this point it's worth to mention that: it's very rarely a good idea to use user agent sniffing. There are always better and more broadly compatible way to address a certain issue.
The idea behind detecting the browser can be either of the following:
Some of the alternatives of browser detection are as follows:
To prevent the Selenium driven WebDriver from getting detected, a niche approach would include either/all of the below mentioned approaches:
Rotating the UserAgent in every execution of your Test Suite using fake_useragent
module as follows:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from fake_useragent import UserAgent
options = Options()
ua = UserAgent()
userAgent = ua.random
print(userAgent)
options.add_argument(f'user-agent={userAgent}')
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
driver.get("https://www.google.co.in")
driver.quit()
You can find a relevant detailed discussion in Way to change Google Chrome user agent in Selenium?
Rotating the UserAgent in each of your Tests using Network.setUserAgentOverride
through execute_cdp_cmd()
as follows:
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\WebDrivers\chromedriver.exe')
print(driver.execute_script("return navigator.userAgent;"))
# Setting user agent as Chrome/83.0.4103.97
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
You can find a relevant detailed discussion in How to change the User Agent using Selenium and Python
Changing the property value of navigator
for webdriver to undefined
as follows:
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
"source": """
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
})
"""
})
You can find a relevant detailed discussion in Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
navigator.plugins
, navigator.languages
, WebGL, hairline feature, missing image, etc.You can find a relevant detailed discussion in Is there a version of selenium webdriver that is not detectable?
You can find a relevant detailed discussion in How to bypass Google captcha with Selenium and python?
While dealing with 2captcha and recaptcha-v3 rather clicking on checkbox associated to the text I'm not a robot, it may be easier to get authenticated extracting and using the data-sitekey
.
You can find a relevant detailed discussion in How to identify the 32 bit data-sitekey of ReCaptcha V2 to obtain a valid response programmatically using Selenium and Python Requests?