[python] How to get text of an element in Selenium WebDriver, without including child element text?

<div id="a">This is some
   <div id="b">text</div>
</div>

Getting "This is some" is non-trivial. For instance, this returns "This is some text":

driver.find_element_by_id('a').text

How does one, in a general way, get the text of a specific element without including the text of it's children?

(I'm providing an answer below but will leave the question open in case someone can come up with a less hideous solution).

This question is related to python html selenium selenium-webdriver

The answer is


Here's a general solution:

def get_text_excluding_children(driver, element):
    return driver.execute_script("""
    return jQuery(arguments[0]).contents().filter(function() {
        return this.nodeType == Node.TEXT_NODE;
    }).text();
    """, element)

The element passed to the function can be something obtained from the find_element...() methods (i.e. it can be a WebElement object).

Or if you don't have jQuery or don't want to use it you can replace the body of the function above above with this:

return self.driver.execute_script("""
var parent = arguments[0];
var child = parent.firstChild;
var ret = "";
while(child) {
    if (child.nodeType === Node.TEXT_NODE)
        ret += child.textContent;
    child = child.nextSibling;
}
return ret;
""", element) 

I'm actually using this code in a test suite.


In the HTML which you have shared:

<div id="a">This is some
   <div id="b">text</div>
</div>

The text This is some is within a text node. To depict the text node in a structured way:

<div id="a">
    This is some
   <div id="b">text</div>
</div>

This Usecase

To extract and print the text This is some from the text node using Selenium's client you have 2 ways as follows:

  • Using splitlines(): You can identify the parent element i.e. <div id="a">, extract the innerHTML and then use splitlines() as follows:

    • using xpath:

      print(driver.find_element_by_xpath("//div[@id='a']").get_attribute("innerHTML").splitlines()[0])
      
    • using xpath:

      print(driver.find_element_by_css_selector("div#a").get_attribute("innerHTML").splitlines()[0])
      
  • Using execute_script(): You can also use the execute_script() method which can synchronously execute JavaScript in the current window/frame as follows:

    • using xpath and firstChild:

      parent_element = driver.find_element_by_xpath("//div[@id='a']")
      print(driver.execute_script('return arguments[0].firstChild.textContent;', parent_element).strip())
      
    • using xpath and childNodes[n]:

      parent_element = driver.find_element_by_xpath("//div[@id='a']")
      print(driver.execute_script('return arguments[0].childNodes[1].textContent;', parent_element).strip())
      

Unfortunately, Selenium was only built to work with Elements, not Text nodes.

If you try to use a function like get_element_by_xpath to target the text nodes, Selenium will throw an InvalidSelectorException.

One workaround is to grab the relevant HTML with Selenium and then use an HTML parsing library like BeautifulSoup that can handle text nodes more elegantly.

import bs4
from bs4 import BeautifulSoup

inner_html = driver.find_elements_by_css_selector('#a')[0].get_attribute("innerHTML")
inner_soup = BeautifulSoup(inner_html, 'html.parser')

outer_html = driver.find_elements_by_css_selector('#a')[0].get_attribute("outerHTML")
outer_soup = BeautifulSoup(outer_html, 'html.parser')

From there, there are several ways to search for the Text content. You'll have to experiment to see what works best for your use case.

Here's a simple one-liner that may be sufficient:

inner_soup.find(text=True)

If that doesn't work, then you can loop through the element's child nodes with .contents() and check their object type.

BeautifulSoup has four types of elements, and the one that you'll be interested in is the NavigableString type, which is produced by Text nodes. By contrast, Elements will have a type of Tag.

contents = inner_soup.contents

for bs4_object in contents:

    if (type(bs4_object) == bs4.Tag):
        print("This object is an Element.")

    elif (type(bs4_object) == bs4.NavigableString):
        print("This object is a Text node.")

Note that BeautifulSoup doesn't support Xpath expressions. If you need those, then you can use some of the workarounds in this thread.


def get_true_text(tag):
    children = tag.find_elements_by_xpath('*')
    original_text = tag.text
    for child in children:
        original_text = original_text.replace(child.text, '', 1)
    return original_text

You don't have to do a replace, you can get the length of the children text and subtract that from the overall length, and slice into the original text. That should be substantially faster.


Examples related to python

programming a servo thru a barometer Is there a way to view two blocks of code from the same file simultaneously in Sublime Text? python variable NameError Why my regexp for hyphenated words doesn't work? Comparing a variable with a string python not working when redirecting from bash script is it possible to add colors to python output? Get Public URL for File - Google Cloud Storage - App Engine (Python) Real time face detection OpenCV, Python xlrd.biffh.XLRDError: Excel xlsx file; not supported Could not load dynamic library 'cudart64_101.dll' on tensorflow CPU-only installation

Examples related to html

Embed ruby within URL : Middleman Blog Please help me convert this script to a simple image slider Generating a list of pages (not posts) without the index file Why there is this "clear" class before footer? Is it possible to change the content HTML5 alert messages? Getting all files in directory with ajax DevTools failed to load SourceMap: Could not load content for chrome-extension How to set width of mat-table column in angular? How to open a link in new tab using angular? ERROR Error: Uncaught (in promise), Cannot match any routes. URL Segment

Examples related to selenium

SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 81 session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium Selenium: WebDriverException:Chrome failed to start: crashed as google-chrome is no longer running so ChromeDriver is assuming that Chrome has crashed WebDriverException: unknown error: DevToolsActivePort file doesn't exist while trying to initiate Chrome Browser Class has been compiled by a more recent version of the Java Environment How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium? How to make Firefox headless programmatically in Selenium with Python? element not interactable exception in selenium web automation Selenium Web Driver & Java. Element is not clickable at point (x, y). Other element would receive the click How do you fix the "element not interactable" exception?

Examples related to selenium-webdriver

SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 81 Selenium: WebDriverException:Chrome failed to start: crashed as google-chrome is no longer running so ChromeDriver is assuming that Chrome has crashed WebDriverException: unknown error: DevToolsActivePort file doesn't exist while trying to initiate Chrome Browser How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium? How to make Firefox headless programmatically in Selenium with Python? Selenium Web Driver & Java. Element is not clickable at point (x, y). Other element would receive the click How do you fix the "element not interactable" exception? Scrolling to element using webdriver? Only local connections are allowed Chrome and Selenium webdriver Check if element is clickable in Selenium Java