[xml] Using XPATH to search text containing  

I use XPather Browser to check my XPATH expressions on an HTML page.

My end goal is to use these expressions in Selenium for the testing of my user interfaces.

I got an HTML file with a content similar to this:

<tr>
  <td>abc</td>
  <td>&nbsp;</td>
</tr>

I want to select a node with a text containing the string "&nbsp;".

With a normal string like "abc" there is no problem. I use an XPATH similar to //td[text()="abc"].

When I try with an an XPATH like //td[text()="&nbsp;"] it returns nothing. Is there a special rule concerning texts with "&" ?

This question is related to xml search xpath selenium

The answer is


I found I can make the match when I input a hard-coded non-breaking space (U+00A0) by typing Alt+0160 on Windows between the two quotes...

//table[@id='TableID']//td[text()=' ']

worked for me with the special char.

From what I understood, the XPath 1.0 standard doesn't handle escaping Unicode chars. There seems to be functions for that in XPath 2.0 but it looks like Firefox doesn't support it (or I misunderstood something). So you have to do with local codepage. Ugly, I know.

Actually, it looks like the standard is relying on the programming language using XPath to provide the correct Unicode escape sequence... So, somehow, I did the right thing.


As per the HTML you have provided:

<tr>
  <td>abc</td>
  <td>&nbsp;</td>
</tr>

To locate the node with the string &nbsp; you can use either of the following based solutions:

  • Using text():

    "//td[text()='\u00A0']"
    
  • Using contains():

    "//td[contains(., '\u00A0')]"
    

However, ideally you may like to avoid the NO-BREAK SPACE character and use either of the following Locator Strategies:

  • Using the parent <tr> node and following-sibling:

    "//tr//following-sibling::td[2]"
    
  • Using starts-with():

    "//tr//td[last()]"
    
  • Using the preceeding <td> node and followingnode andfollowing-sibling`:

    "//td[text()='abc']//following::td[1]"
    

Reference

You can find a relevant detailed discussion in:


tl; dr

Unicode Character 'NO-BREAK SPACE' (U+00A0)


Try using the decimal entity &#160; instead of the named entity. If that doesn't work, you should be able to simply use the unicode character for a non-breaking space instead of the &nbsp; entity.

(Note: I did not try this in XPather, but I did try it in Oxygen.)


Bear in mind that a standards-compliant XML processor will have replaced any entity references other than XML's five standard ones (&amp;, &gt;, &lt;, &apos;, &quot;) with the corresponding character in the target encoding by the time XPath expressions are evaluated. Given that behavior, PhiLho's and jsulak's suggestions are the way to go if you want to work with XML tools. When you enter &#160; in the XPath expression, it should be converted to the corresponding byte sequence before the XPath expression is applied.


Bear in mind that a standards-compliant XML processor will have replaced any entity references other than XML's five standard ones (&amp;, &gt;, &lt;, &apos;, &quot;) with the corresponding character in the target encoding by the time XPath expressions are evaluated. Given that behavior, PhiLho's and jsulak's suggestions are the way to go if you want to work with XML tools. When you enter &#160; in the XPath expression, it should be converted to the corresponding byte sequence before the XPath expression is applied.


Try using the decimal entity &#160; instead of the named entity. If that doesn't work, you should be able to simply use the unicode character for a non-breaking space instead of the &nbsp; entity.

(Note: I did not try this in XPather, but I did try it in Oxygen.)


Search for &nbsp; or only nbsp - did you try this?


Try using the decimal entity &#160; instead of the named entity. If that doesn't work, you should be able to simply use the unicode character for a non-breaking space instead of the &nbsp; entity.

(Note: I did not try this in XPather, but I did try it in Oxygen.)


I cannot get a match using Xpather, but the following worked for me with plain XML and XSL files in Microsoft's XML Notepad:

<xsl:value-of select="count(//td[text()='&nbsp;'])" />

The value returned is 1, which is the correct value in my test case.

However, I did have to declare nbsp as an entity within my XML and XSL using the following:

<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#160;"> ]>

I'm not sure if that helps you, but I was able to actually find nbsp using an XPath expression.

Edit: My code sample actually contains the characters '&nbsp;' but the JavaScript syntax highlight converts it to the space character. Don't be mislead!


I found I can make the match when I input a hard-coded non-breaking space (U+00A0) by typing Alt+0160 on Windows between the two quotes...

//table[@id='TableID']//td[text()=' ']

worked for me with the special char.

From what I understood, the XPath 1.0 standard doesn't handle escaping Unicode chars. There seems to be functions for that in XPath 2.0 but it looks like Firefox doesn't support it (or I misunderstood something). So you have to do with local codepage. Ugly, I know.

Actually, it looks like the standard is relying on the programming language using XPath to provide the correct Unicode escape sequence... So, somehow, I did the right thing.


Search for &nbsp; or only nbsp - did you try this?


I found I can make the match when I input a hard-coded non-breaking space (U+00A0) by typing Alt+0160 on Windows between the two quotes...

//table[@id='TableID']//td[text()=' ']

worked for me with the special char.

From what I understood, the XPath 1.0 standard doesn't handle escaping Unicode chars. There seems to be functions for that in XPath 2.0 but it looks like Firefox doesn't support it (or I misunderstood something). So you have to do with local codepage. Ugly, I know.

Actually, it looks like the standard is relying on the programming language using XPath to provide the correct Unicode escape sequence... So, somehow, I did the right thing.


I cannot get a match using Xpather, but the following worked for me with plain XML and XSL files in Microsoft's XML Notepad:

<xsl:value-of select="count(//td[text()='&nbsp;'])" />

The value returned is 1, which is the correct value in my test case.

However, I did have to declare nbsp as an entity within my XML and XSL using the following:

<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#160;"> ]>

I'm not sure if that helps you, but I was able to actually find nbsp using an XPath expression.

Edit: My code sample actually contains the characters '&nbsp;' but the JavaScript syntax highlight converts it to the space character. Don't be mislead!


Try using the decimal entity &#160; instead of the named entity. If that doesn't work, you should be able to simply use the unicode character for a non-breaking space instead of the &nbsp; entity.

(Note: I did not try this in XPather, but I did try it in Oxygen.)


As per the HTML you have provided:

<tr>
  <td>abc</td>
  <td>&nbsp;</td>
</tr>

To locate the node with the string &nbsp; you can use either of the following based solutions:

  • Using text():

    "//td[text()='\u00A0']"
    
  • Using contains():

    "//td[contains(., '\u00A0')]"
    

However, ideally you may like to avoid the NO-BREAK SPACE character and use either of the following Locator Strategies:

  • Using the parent <tr> node and following-sibling:

    "//tr//following-sibling::td[2]"
    
  • Using starts-with():

    "//tr//td[last()]"
    
  • Using the preceeding <td> node and followingnode andfollowing-sibling`:

    "//td[text()='abc']//following::td[1]"
    

Reference

You can find a relevant detailed discussion in:


tl; dr

Unicode Character 'NO-BREAK SPACE' (U+00A0)


Bear in mind that a standards-compliant XML processor will have replaced any entity references other than XML's five standard ones (&amp;, &gt;, &lt;, &apos;, &quot;) with the corresponding character in the target encoding by the time XPath expressions are evaluated. Given that behavior, PhiLho's and jsulak's suggestions are the way to go if you want to work with XML tools. When you enter &#160; in the XPath expression, it should be converted to the corresponding byte sequence before the XPath expression is applied.


I cannot get a match using Xpather, but the following worked for me with plain XML and XSL files in Microsoft's XML Notepad:

<xsl:value-of select="count(//td[text()='&nbsp;'])" />

The value returned is 1, which is the correct value in my test case.

However, I did have to declare nbsp as an entity within my XML and XSL using the following:

<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp "&#160;"> ]>

I'm not sure if that helps you, but I was able to actually find nbsp using an XPath expression.

Edit: My code sample actually contains the characters '&nbsp;' but the JavaScript syntax highlight converts it to the space character. Don't be mislead!


Search for &nbsp; or only nbsp - did you try this?


Examples related to xml

strange error in my Animation Drawable How do I POST XML data to a webservice with Postman? PHP XML Extension: Not installed How to add a Hint in spinner in XML Generating Request/Response XML from a WSDL Manifest Merger failed with multiple errors in Android Studio How to set menu to Toolbar in Android How to add colored border on cardview? Android: ScrollView vs NestedScrollView WARNING: Exception encountered during context initialization - cancelling refresh attempt Find a file by name in Visual Studio Code Search all the occurrences of a string in the entire project in Android Studio Java List.contains(Object with field value equal to x) Trigger an action after selection select2 How can I search for a commit message on GitHub? SQL search multiple values in same field Find a string by searching all tables in SQL Server Management Studio 2008 Search File And Find Exact Match And Print Line? Java - Search for files in a directory How to put a delay on AngularJS instant search?

Examples related to xpath

Xpath: select div that contains class AND whose specific child element contains text XPath: difference between dot and text() How to set "value" to input web element using selenium? How to click a href link using Selenium XPath: Get parent node from child node What is the difference between absolute and relative xpaths? Which is preferred in Selenium automation testing? How to use XPath preceding-sibling correctly Selenium and xPath - locating a link by containing text How to verify an XPath expression in Chrome Developers tool or Firefox's Firebug? Concatenate multiple node values in xpath

Examples related to selenium

SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 81 session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium Selenium: WebDriverException:Chrome failed to start: crashed as google-chrome is no longer running so ChromeDriver is assuming that Chrome has crashed WebDriverException: unknown error: DevToolsActivePort file doesn't exist while trying to initiate Chrome Browser Class has been compiled by a more recent version of the Java Environment How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium? How to make Firefox headless programmatically in Selenium with Python? element not interactable exception in selenium web automation Selenium Web Driver & Java. Element is not clickable at point (x, y). Other element would receive the click How do you fix the "element not interactable" exception?