I use XPather Browser to check my XPATH expressions on an HTML page.
My end goal is to use these expressions in Selenium for the testing of my user interfaces.
I got an HTML file with a content similar to this:
<tr> <td>abc</td> <td> </td> </tr>
I want to select a node with a text containing the string "
".
With a normal string like "abc" there is no problem. I use an XPATH similar to //td[text()="abc"]
.
When I try with an an XPATH like //td[text()=" "]
it returns nothing. Is there a special rule concerning texts with "&
" ?
I found I can make the match when I input a hard-coded non-breaking space (U+00A0) by typing Alt+0160 on Windows between the two quotes...
//table[@id='TableID']//td[text()=' ']
worked for me with the special char.
From what I understood, the XPath 1.0 standard doesn't handle escaping Unicode chars. There seems to be functions for that in XPath 2.0 but it looks like Firefox doesn't support it (or I misunderstood something). So you have to do with local codepage. Ugly, I know.
Actually, it looks like the standard is relying on the programming language using XPath to provide the correct Unicode escape sequence... So, somehow, I did the right thing.
As per the HTML you have provided:
<tr>
<td>abc</td>
<td> </td>
</tr>
To locate the node with the string
you can use either of the following xpath based solutions:
Using text()
:
"//td[text()='\u00A0']"
Using contains()
:
"//td[contains(., '\u00A0')]"
However, ideally you may like to avoid the NO-BREAK SPACE character and use either of the following Locator Strategies:
Using the parent <tr>
node and following-sibling
:
"//tr//following-sibling::td[2]"
Using starts-with()
:
"//tr//td[last()]"
Using the preceeding <td>
node and following
node and
following-sibling`:
"//td[text()='abc']//following::td[1]"
You can find a relevant detailed discussion in:
Try using the decimal entity  
instead of the named entity. If that doesn't work, you should be able to simply use the unicode character for a non-breaking space instead of the
entity.
(Note: I did not try this in XPather, but I did try it in Oxygen.)
Bear in mind that a standards-compliant XML processor will have replaced any entity references other than XML's five standard ones (&
, >
, <
, '
, "
) with the corresponding character in the target encoding by the time XPath expressions are evaluated. Given that behavior, PhiLho's and jsulak's suggestions are the way to go if you want to work with XML tools. When you enter  
in the XPath expression, it should be converted to the corresponding byte sequence before the XPath expression is applied.
Bear in mind that a standards-compliant XML processor will have replaced any entity references other than XML's five standard ones (&
, >
, <
, '
, "
) with the corresponding character in the target encoding by the time XPath expressions are evaluated. Given that behavior, PhiLho's and jsulak's suggestions are the way to go if you want to work with XML tools. When you enter  
in the XPath expression, it should be converted to the corresponding byte sequence before the XPath expression is applied.
Try using the decimal entity  
instead of the named entity. If that doesn't work, you should be able to simply use the unicode character for a non-breaking space instead of the
entity.
(Note: I did not try this in XPather, but I did try it in Oxygen.)
Search for
or only nbsp
- did you try this?
Try using the decimal entity  
instead of the named entity. If that doesn't work, you should be able to simply use the unicode character for a non-breaking space instead of the
entity.
(Note: I did not try this in XPather, but I did try it in Oxygen.)
I cannot get a match using Xpather, but the following worked for me with plain XML and XSL files in Microsoft's XML Notepad:
<xsl:value-of select="count(//td[text()=' '])" />
The value returned is 1, which is the correct value in my test case.
However, I did have to declare nbsp as an entity within my XML and XSL using the following:
<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp " "> ]>
I'm not sure if that helps you, but I was able to actually find nbsp using an XPath expression.
Edit: My code sample actually contains the characters ' ' but the JavaScript syntax highlight converts it to the space character. Don't be mislead!
I found I can make the match when I input a hard-coded non-breaking space (U+00A0) by typing Alt+0160 on Windows between the two quotes...
//table[@id='TableID']//td[text()=' ']
worked for me with the special char.
From what I understood, the XPath 1.0 standard doesn't handle escaping Unicode chars. There seems to be functions for that in XPath 2.0 but it looks like Firefox doesn't support it (or I misunderstood something). So you have to do with local codepage. Ugly, I know.
Actually, it looks like the standard is relying on the programming language using XPath to provide the correct Unicode escape sequence... So, somehow, I did the right thing.
Search for
or only nbsp
- did you try this?
I found I can make the match when I input a hard-coded non-breaking space (U+00A0) by typing Alt+0160 on Windows between the two quotes...
//table[@id='TableID']//td[text()=' ']
worked for me with the special char.
From what I understood, the XPath 1.0 standard doesn't handle escaping Unicode chars. There seems to be functions for that in XPath 2.0 but it looks like Firefox doesn't support it (or I misunderstood something). So you have to do with local codepage. Ugly, I know.
Actually, it looks like the standard is relying on the programming language using XPath to provide the correct Unicode escape sequence... So, somehow, I did the right thing.
I cannot get a match using Xpather, but the following worked for me with plain XML and XSL files in Microsoft's XML Notepad:
<xsl:value-of select="count(//td[text()=' '])" />
The value returned is 1, which is the correct value in my test case.
However, I did have to declare nbsp as an entity within my XML and XSL using the following:
<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp " "> ]>
I'm not sure if that helps you, but I was able to actually find nbsp using an XPath expression.
Edit: My code sample actually contains the characters ' ' but the JavaScript syntax highlight converts it to the space character. Don't be mislead!
Try using the decimal entity  
instead of the named entity. If that doesn't work, you should be able to simply use the unicode character for a non-breaking space instead of the
entity.
(Note: I did not try this in XPather, but I did try it in Oxygen.)
As per the HTML you have provided:
<tr>
<td>abc</td>
<td> </td>
</tr>
To locate the node with the string
you can use either of the following xpath based solutions:
Using text()
:
"//td[text()='\u00A0']"
Using contains()
:
"//td[contains(., '\u00A0')]"
However, ideally you may like to avoid the NO-BREAK SPACE character and use either of the following Locator Strategies:
Using the parent <tr>
node and following-sibling
:
"//tr//following-sibling::td[2]"
Using starts-with()
:
"//tr//td[last()]"
Using the preceeding <td>
node and following
node and
following-sibling`:
"//td[text()='abc']//following::td[1]"
You can find a relevant detailed discussion in:
Bear in mind that a standards-compliant XML processor will have replaced any entity references other than XML's five standard ones (&
, >
, <
, '
, "
) with the corresponding character in the target encoding by the time XPath expressions are evaluated. Given that behavior, PhiLho's and jsulak's suggestions are the way to go if you want to work with XML tools. When you enter  
in the XPath expression, it should be converted to the corresponding byte sequence before the XPath expression is applied.
I cannot get a match using Xpather, but the following worked for me with plain XML and XSL files in Microsoft's XML Notepad:
<xsl:value-of select="count(//td[text()=' '])" />
The value returned is 1, which is the correct value in my test case.
However, I did have to declare nbsp as an entity within my XML and XSL using the following:
<!DOCTYPE xsl:stylesheet [ <!ENTITY nbsp " "> ]>
I'm not sure if that helps you, but I was able to actually find nbsp using an XPath expression.
Edit: My code sample actually contains the characters ' ' but the JavaScript syntax highlight converts it to the space character. Don't be mislead!
Search for
or only nbsp
- did you try this?
Source: Stackoverflow.com