XPath contains text some string doesn t work when used with node with more than one Text subnode

Question

I have a small problem with Xpath contains with dom4j      Lets say my XML is    lt Home gt       lt Addr gt           lt Street gt ABC lt  Street gt           lt Number gt 5 lt  Number gt           lt Comment gt BLAH BLAH BLAH  lt br  gt  lt br  gt ABC lt  Comment gt       lt  Addr gt   lt  Home gt    Lets say I want to find all the nodes that have ABC in the text given the root Element     So the xpath that i would needed to write would be       contains text    ABC     However this is not what Dom4j returns      is this a dom4j problem or my understanding how xpath works   since that query returns only the Street Element and not the Comment element   The DOM makes the Comment element a composite element with four tags two    Text    XYZ   BR  BR  Text    ABC      I would assume that the query should still return the element since it should find the element and run contains on it but it doesn t           the following query returns the element but it returns far more then just the element  it returns the parent elements as well     which is undesirable to the problem          contains text    ABC      Does any one know the xpath query that would return just the Elements  lt Street  gt  and  lt Comment  gt

User · Answer

text    ABC      returns   lt street gt ABC lt  street gt   lt comment gt BLAH BLAH BLAH  lt br gt  lt br gt ABC lt  comment gt

User · Answer

The  lt Comment gt  tag contains two text nodes and two  lt br gt  nodes as children   Your xpath expression was       contains text    ABC      To break this down       is a selector that matches any element  i e  tag  -- it returns a node-set  The    are a conditional that operates on each individual node in that node set  It matches if any of the individual nodes it operates on match the conditions inside the brackets  text   is a selector that matches all of the text nodes that are children of the context node -- it returns a node set  contains is a function that operates on a string  If it is passed a node set  the node set is converted into a string by returning the string-value  of the node in the node-set that is first in document order  Hence  it can match only the first text node in your  lt Comment gt  element -- namely BLAH BLAH BLAH  Since that doesn t match  you don t get a  lt Comment gt  in your results    You need to change this to       text   contains    ABC          is a selector that matches any element  i e  tag  -- it returns a node-set  The outer    are a conditional that operates on each individual node in that node set -- here it operates on each element in the document  text   is a selector that matches all of the text nodes that are children of the context node -- it returns a node set  The inner    are a conditional that operates on each node in that node set -- here each individual text node  Each individual text node is the starting point for any path in the brackets  and can also  be referred to explicitly as   within the brackets  It matches if any of the individual nodes it operates on match the conditions inside the brackets  contains is a function that operates on a string  Here it is passed an individual text node      Since it is passed the second text node in the  lt Comment gt  tag individually  it will see the  ABC  string and be able to match it

User · Answer

The accepted answer will return all the parent nodes too  To get only the actual nodes with ABC even if the string is after        text   contains    ABC     text   contains    ABC

User · Answer

The XML document   lt Home gt       lt Addr gt           lt Street gt ABC lt  Street gt           lt Number gt 5 lt  Number gt           lt Comment gt BLAH BLAH BLAH  lt br  gt  lt br  gt ABC lt  Comment gt       lt  Addr gt   lt  Home gt   The XPath expression      contains text     ABC         matches any descendant element of the root node  That is  any element but the root node        is a predicate  it filters the node-set  It returns nodes for which     is true   A predicate filters a node-set       to produce a new node-set  For each node in the node-set to be filtered  the PredicateExpr is evaluated        if PredicateExpr evaluates to true for that node  the node is included in the new node-set  otherwise  it is not included   contains  haystack    needle   returns true if haystack contains needle   Function  boolean contains string  string  The contains function returns true if the first argument string contains the second argument string  and otherwise returns false   But contains   takes a string as its first parameter  And it s passed nodes  To deal with that every node or node-set passed as the first parameter is converted to a string by the string   function   An argument is converted to type string as if by calling the string function   string   function returns string-value of the first node   A node-set is converted to a string by returning the string-value of the node in the node-set that is first in document order  If the node-set is empty  an empty string is returned   string-value of an element node   The string-value of an element node is the concatenation of the string-values of all text node descendants of the element node in document order   string-value of a text node   The string-value of a text node is the character data   So  basically string-value is all text that is contained in a node  concatenation of all descendant text nodes   text   is a node test that matches any text node   The node test text   is true for any text node  For example  child  text   will select the text node children of the context node   Having that said      contains text     ABC    matches any element  but the root node   the first text node of which contains ABC  Since text   returns a node-set that contains all child text nodes of the context node  relative to which an expression is evaluated   But contains   takes only the first one  So for the document above the path matches the Street element  The following expression     text   contains     ABC     matches any element  but the root node   that has at least one child text node  that contains ABC    represents the context node  In this case  it s a child text node of any element but the root node  So for the document above the path matches the Street  and the Comment elements  Now then      contains     ABC    matches any element  but the root node  that contains ABC  in the concatenation of the descendant text nodes   For the document above it matches the Home  the Addr  the Street  and the Comment elements  As such      contains     BLAH ABC    matches the Home  the Addr  and the Comment elements

User · Answer

It took me a little while but finally figured out  Custom xpath that contains some text below worked perfectly for me     a contains text    JB-

User · Answer

contains text        only returns true or false  It won t return any element results

[xpath] XPath contains(text(),'some string') doesn't work when used with node with more than one Text subnode

Examples related to xpath

Examples related to dom4j