[xml] How to use XPath contains() here?

This is a new answer to an old question about a common misconception about contains() in XPath...

Summary: contains() means contains a substring, not contains a node.

Detailed Explanation

This XPath is often misinterpreted:

//ul[contains(li, 'Model')]

Wrong interpretation: Select those ul elements that contain an li element with Model in it.

This is wrong because

  1. contains(x,y) expects x to be a string, and
  2. the XPath rule for converting multiple elements to a string is this:

    A node-set is converted to a string by returning the string-value of the node in the node-set that is first in document order. If the node-set is empty, an empty string is returned.

Right interpretation: Select those ul elements whose first li child has a string-value that contains a Model substring.

Examples

XML

<r>
  <ul id="one">
    <li>Model A</li>
    <li>Foo</li>
  </ul>
  <ul id="two">
    <li>Foo</li>
    <li>Model A</li>
  </ul>
</r> 

XPaths

  • //ul[contains(li, 'Model')] selects the one ul element.

    Note: The two ul element is not selected because the string-value of the first li child of the two ul is Foo, which does not contain the Model substring.

  • //ul[li[contains(.,'Model')]] selects the one and two ul elements.

    Note: Both ul elements are selected because contains() is applied to each li individually. (Thus, the tricky multiple-element-to-string conversion rule is avoided.) Both ul elements do have an li child whose string value contains the Model substring -- position of the li element no longer matters.

See also