XPath - Difference between node and text

Question

I m having trouble understanding the difference between text   and node    From what I understand  text   would be whatever is in between the tags  lt item gt apple lt  item gt  which is apple in this case  Node would be whatever that node actually is  which would be item  But then I ve been assigned some work where it asks me to  Select the text of all items under produce  and a separate question asks  Select all the manager nodes in all departments    How is the output suppose to look text   as opposed to node    Snippet of XML    lt produce gt    lt item gt apple lt  item gt    lt item gt banana lt  item gt    lt item gt pepper lt  item gt   lt  produce gt    lt department gt    lt phone gt 123-456-7891 lt  phone gt    lt manager gt John lt  manager gt   lt  department gt    Of course  there are more departments and more managers  but this was just a snippet of code    Any help would be much appreciated

User · Answer

Select the text of all items under produce     produce item text     Select all the manager nodes in all departments     department

User · Answer

text   and node   are node tests  in XPath terminology  compare    Node tests operate on a set  on an axis  to be exact  of nodes and return the ones that are of a certain type  When no axis is mentioned  the child axis is assumed by default   There are all kinds of node tests    node   matches any node  the least specific node test of them all  text   matches text nodes only comment   matches comment nodes   matches any element node foo matches any element node named  foo  processing-instruction   matches PI nodes  they look like  lt  name value  gt    Side note  The   also matches attribute nodes  but only along the attribute axis     is a shorthand for attribute     Attributes are not part of the child axis  that s why a normal   does not select them    This XML document    lt produce gt       lt item gt apple lt  item gt       lt item gt banana lt  item gt       lt item gt pepper lt  item gt   lt  produce gt    represents the following DOM  simplified     root node    element node  name  produce         text node  value   n             element node  name  item            text node  value  apple         text node  value   n             element node  name  item            text node  value  banana         text node  value   n             element node  name  item            text node  value  pepper         text node  value   n     So with XPath      selects the root node  produce selects a child element of the root node if it has the name  produce   This is called the document element  it represents the document itself  Document element and root node are often confused  but they are not the same thing     produce node   selects any type of child node beneath  produce   i e  all 7 children   produce text   selects the 4     whitespace-only text nodes  produce item 1  selects the first child element named  item   produce item 1  text   selects all child text nodes  there s only one -  apple  - in this case    And so on   So  your questions    Select the text of all items under produce   produce item text    3 nodes selected   Select all the manager nodes in all departments    department manager  1 node selected    Notes   The default axis in XPath is the child axis  You can change the axis by prefixing a different axis name  For example    item ancestor  produce  Element nodes have text values  When you evaluate an element node  its textual contents will be returned  In case of this example   produce item 1  text   and string  produce item 1   will be the same  Also see this answer where I outline the individual parts of an XPath expression graphically

User · Answer

For me it was a big difference when I faced this scenario  here my story    lt  xml version  quot 1 0 quot  encoding  quot UTF-8 quot   gt   lt sentence id  quot S1 6 quot  gt When U937 cells were infected with HIV-1                 lt xcope id  quot X1 6 3 quot  gt           lt cue ref  quot X1 6 3 quot  type  quot negation quot  gt no lt  cue gt                                                    induction of NF-KB factor was detected               lt  xcope gt                         whereas high level of progeny virions was produced                 lt xcope id  quot X1 6 2 quot  gt           lt cue ref  quot X1 6 2 quot  type  quot speculation quot  gt suggesting lt  cue gt  that this factor was           lt xcope id  quot X1 6 1 quot  gt               lt cue ref  quot X1 6 1 quot  type  quot negation quot  gt not lt  cue gt  required for viral replication          lt  xcope gt       lt  xcope gt     lt  sentence gt   I needed to extract text between tags and aggregate  by concat  the text including in innner tags   node   did the job  while  text   made half job  text   only returned text not included in inner tags  because inner tags are not  quot text nodes quot   You may think   quot just extract text included in the inner tags in an additional xpath quot   however  it becomes challenging to sort the text in this original order because you dont know where to place the aggregated text from the inner tags because you dont know where to place the aggregated text from the inner nodes   When U937 cells were infected with HIV-1  no induction of NF-KB factor was detected   whereas high level of progeny virions was produced  suggesting that this factor was not required for viral replication    Finally   node   did exactly what I wanted  because it gets the text from inner tags too

[xml] XPath - Difference between node() and text()

Examples related to xml

Examples related to xpath

Examples related to expression