How to get text of an element in Selenium WebDriver without including child element text

Question

lt div id  a  gt This is some     lt div id  b  gt text lt  div gt   lt  div gt    Getting  This is some  is non-trivial   For instance  this returns  This is some text    driver find element by id  a   text   How does one  in a general way  get the text of a specific element without including the text of it s children    I m providing an answer below but will leave the question open in case someone can come up with a less hideous solution

User · Answer

In the HTML which you have shared    lt div id  a  gt This is some     lt div id  b  gt text lt  div gt   lt  div gt    The text This is some is within a text node  To depict the text node in a structured way    lt div id  a  gt      This is some     lt div id  b  gt text lt  div gt   lt  div gt      This Usecase  To extract and print the text This is some from the text node using Selenium s python client you have 2 ways as follows    Using splitlines    You can identify the parent element i e   lt div id  a  gt   extract the innerHTML and then use splitlines   as follows    using xpath   print driver find element by xpath    div  id  a     get attribute  innerHTML   splitlines   0    using xpath   print driver find element by css selector  div a   get attribute  innerHTML   splitlines   0     Using execute script     You can also use the execute script   method which can synchronously execute JavaScript in the current window frame as follows    using xpath and firstChild   parent element   driver find element by xpath    div  id  a     print driver execute script  return arguments 0  firstChild textContent    parent element  strip     using xpath and childNodes n    parent element   driver find element by xpath    div  id  a     print driver execute script  return arguments 0  childNodes 1  textContent    parent element  strip

User · Answer

Unfortunately  Selenium was only built to work with Elements  not Text nodes   If you try to use a function like get element by xpath to target the text nodes  Selenium will throw an InvalidSelectorException   One workaround is to grab the relevant HTML with Selenium and then use an HTML parsing library like BeautifulSoup that can handle text nodes more elegantly   import bs4 from bs4 import BeautifulSoup  inner html   driver find elements by css selector   a   0  get attribute  innerHTML   inner soup   BeautifulSoup inner html   html parser    outer html   driver find elements by css selector   a   0  get attribute  outerHTML   outer soup   BeautifulSoup outer html   html parser     From there  there are several ways to search for the Text content  You ll have to experiment to see what works best for your use case   Here s a simple one-liner that may be sufficient   inner soup find text True    If that doesn t work  then you can loop through the element s child nodes with  contents   and check their object type   BeautifulSoup has four types of elements  and the one that you ll be interested in is the NavigableString type  which is produced by Text nodes  By contrast  Elements will have a type of Tag   contents   inner soup contents  for bs4 object in contents       if  type bs4 object     bs4 Tag           print  This object is an Element         elif  type bs4 object     bs4 NavigableString           print  This object is a Text node      Note that BeautifulSoup doesn t support Xpath expressions  If you need those  then you can use some of the workarounds in this thread

User · Answer

You don t have to do a replace  you can get the length of the children text and subtract that from the overall length  and slice into the original text   That should be substantially faster

User · Answer

def get true text tag       children   tag find elements by xpath          original text   tag text     for child in children          original text   original text replace child text      1      return original text

User · Answer

Here s a general solution   def get text excluding children driver  element       return driver execute script         return jQuery arguments 0   contents   filter function             return this nodeType    Node TEXT NODE         text             element    The element passed to the function can be something obtained from the find element      methods  i e  it can be a WebElement object    Or if you don t have jQuery or don t want to use it you can replace the body of the function above above with this   return self driver execute script     var parent   arguments 0   var child   parent firstChild  var ret       while child        if  child nodeType     Node TEXT NODE          ret    child textContent      child   child nextSibling    return ret       element     I m actually using this code in a test suite

[python] How to get text of an element in Selenium WebDriver, without including child element text?

This Usecase

Examples related to python

Examples related to html

Examples related to selenium

Examples related to selenium-webdriver