Recipe 1.1. Effectively Using Axes

Problem

You need to select nodes in an XML tree in ways that consider complex relationships within the hierarchical structure.

Solution

Each of the following solutions is organized around related sets of axes. For each group, a sample XML document is presented with the context node in bold. An explanation of the effect of evaluating the path is provided, along with an indication of the nodes that will be selected with respect to the highlighted context. In some cases, the solution will consider other nodes as the context to illustrate subtleties of the particular path expression.

Child and descendant axes

The child axis is the default axis in XPath. This means one does not need to use the child:: axis specification, but you can if you are feeling pedantic. One can reach deeper into the XML tree using the descendant:: and the descendant-or-self:: axes. The former excludes the context node and the latter includes it.

<Test id="descendants">
   <parent>
      <X id="1"/>
      <X id="2"/>
      <Y id="3">
        <X id="3-1"/>
        <Y id="3-2"/>
        <X id="3-3"/>
      </Y>
      <X id="4"/>
      <Y id="5"/>
      <Z id="6"/>
      <X id="7"/>
      <X id="8"/>
      <Y id="9"/>
    </parent>
</Test>

(: Select all child elements named X :)
X   (: same as child::X :)

Result: <X id="1"/> <X id="2"/> <X id="4"/> <X id="7"/><X id="8"/>

(:Select the first X child element:)

X[1]    

Result: <X id="1"/>

(:Select the last X child element:)

X[last( )]    

Result: <X id="8"/>


(:Select the first element, provided it is an X. Otherwise empty:)

*[1][self::X]    

Result: <X id="1"/>


(:Select the last child, provided it is an X. Otherwise empty:)

*[last( )][self::X]    

Result: ( )

*[last( )][self::Y]    

Result: <Y id="9"/>

(: Select all descendants named X :)
descendant::X

Result: <X id="1"/> <X id="2"/> <X id="3-1"/> <X id="3-3"/> <X id="4"/> <X id="7"/> <X id="8"/>

(: Select the context node, if it is an X, and all descendants named X :)

descendant-or-self::X

Result: <X id="1"/> <X id="2"/> <X id="3-1"/> <X id="3-3"/> <X id="4"/> <X id="7"/> <X id="8"/>

(: Select the context node and all descendant elements :)

descendant-or-self::*
Result: <parent> <X id="1"/> <X id="2"/> <Y id="3"> <X id="3-1"/> <Y id="3-2"/> <X 
id="3-3"/> </Y> <X id="4"/> <Y id="5"/> <Z id="6"/> <X id="7"/> <X id="8"/> <Y 
id="9"/> </parent> <X id="1"/> <X id="2"/> <Y id="3"> <X id="3-1"/> <Y id="3-2"/> <X 
id="3-3"/> </Y> <X id="3-1"/> <Y id="3-2"/> <X id="3-3"/> <X id="4"/> <Y id="5"/> <Z 
id="6"/> <X id="7"/> <X id="8"/> <Y id="9"/>

Sibling axes

The sibling axes include preceding-sibling:: and following-sibling::. As the names suggest, the preceding-sibling axis consists of siblings that precede the context node and the following-sibling axis consists of siblings that follow it. Siblings are, of course, child nodes that share the same parent. Most of the examples below use preceding-sibling::, but you should be able to work out the results for following-sibling:: without too much trouble.

Keep in mind that when using a positional path expression of the form preceding-sibling::*[1], you are referring to the immediately preceding sibling looking back from the context node and not the first sibling in document order. Some people get confused because the resulting sequence is in document order regardless as to whether you use preceding-sibling:: or following-sibling::. Although not an axis expression per say, ../X is a way of saying, select both preceding and following siblings named X as well as the context node, should it be an X. More formally speaking, it is an abbreviation for parent::node( )/X. Note that (preceding-sibling::*)[1] and (following-sibling::*)[1] will select the first preceding/following sibling in document order.

<!-- Sample document with context node highlighted -->
<Test id="preceding-siblings">
    <A id="1"/>
    <A id="2"/>
    <B id="3"/>
    <A id="4"/>
    <B id="5"/>
    <C id="6"/>
    <A id="7"/>
    <A id="8"/>
    <B id="9"/>
</Test>

(:Select all A sibling elements that precede the context node. :)
preceding-sibling::A

Result: <A id="1"/> <A id="2"/> <A id="4"/>

(:Select all A sibling elements that follow the context node. :)
following-sibling::A

Result: <A id="8"/> 

(:Select all sibling elements that precede the context node. :)
preceding-sibling::*    

Result: <A id="1"/> <A id="2"/> <B id="3"/> <A id="4"/> <B id="5"/> <C id="6"/>


(: Select the first preceding sibling element named A in reverse document order. :)
preceding-sibling::A[1]    

Result: <A id="4"/>

(: The first preceding element in reverse document order, provided it is an A. :)
preceding-sibling::*[1][self::A]    

Result: ( ) 
(: If the context was <A id="8"/>, the result would be <A id="7"/> :)

(:All preceding sibling elements that are not A elements:)
preceding-sibling::*[not(self::A)]

Result <B id="3"/> <B id="5"/> <C id="6"/>

(: For the following recipes use this document. :)

<Test id="preceding-siblings">
        <A id="1">
            <A/>
        </A>
        <A id="2"/>
        <B id="3">
        <A/>
        </B>
        <A id="4"/>
        <B id="5"/>
        <C id="6"/>
        <A id="7"/>
        <A id="8"/>
        <B id="9"/>
</Test>

(: The element directly preceding the context provided it has a child element A :)
preceding-sibling::*[1][A]

Result: ( )

The first element preceding the context that has a child A                
preceding-sibling::*[A][1]        

Result:        <B id="3"> ...

(: XPath 2.0 allows more flexibility to select elements with respect to namespaces. 
For these recipes the following XML document applies. :)   

<Test xmlns:NS="http://www.ora.com/xstlcbk/1" xmlns:NS2="http://www.ora.com/xstlcbk/2">
  <NS:A id="1"/>
  <NS2:A id="2"/>
  <NS:B id="3"/>
  <NS2:B id="3"/>
</Test>

(: Select the preceding sibling elemements of the context whose namespace 
is the namespace associated with prefix NS :)                                
preceding-sibling::NS:*

Result:        <NS:A id="1"/>

(: Select the preceding sibling elemements of the context whose local name is A :)
preceding-sibling::*:A

Result:        <NS:A id="1"/>, <NS2:A id="2"/>

Parent and ancestor axes

The parent axis (parent::) refers to the parent of the context node. The expression parent::X should not be confused with ../X. The former will produce a sequence of exactly one element provided the parent of the context is X or empty otherwise. The latter is a shorthand for parent::node( )/X, which will select all siblings of the context node named X, including the context itself, should it be an X.

One can navigate to higher levels of the XML tree (parents, grandparents, great-grandparents, and so on) using either ancestor:: or ancestor-or-self::. The former excludes the context and the latter includes it.

(: Select the parent of the context node, provided it is an X element. Empty otherwise. :)
parent::X

(: Select the parent element of the context node. Can only be empty if the context 
is the top-level element. :)
parent::*

(: Select the parent if it is in the namespace associated with the prefex NS. 
The prefix must be defined; otherwise, it is an error. :)
parent::NS:*

(: Select the parent, regardless of its namespace, provided the local name is X. :)
parent::*:X

(: Select all ancestor elements (including the parent) named X. :)
ancestor::X    

(: Select the context, provided it is an X, and all ancestor elements named X. :)
ancestor-or-self::X

Preceding and following axes

The preceding and following axes have the potential to select a large number of nodes, because they consider all nodes that come before (after) the context node in document order excluding ancestor nodes. The following axis excludes descendants, and the preceding axis excludes ancestors. Also don't forget: both axes exclude namespace nodes and attributes.

(: All preceding element nodes named X. :)
preceding::X

(: The closest preceding element node named X. :)
preceding::X[1]

(: The furthest following element node named X. :)
following::X[last( )]

Discussion

XPath uses the notion of an axis to partition the document tree into subsets relative to some node called the context node. In general, these subsets overlap, but the ancestor, descendant, following, preceding, and self axes partition a document (ignoring attribute and namespace nodes): they do not overlap, and together they contain all the nodes in the document. The context node is established by the XPath hosting language. In XSLT, the context is set via:

a template match (<xsl:template match="x"> ... </xsl:template>)
xsl:for-each
xsl:apply-templates

Effectively wielding the kinds of path expression shown in the solution is key to performing both simple and complex transformations. Experience with traditional programming languages sometimes leads to confusion and mistakes when using XPath. For example, I often used to catch myself writing something like <xsl:if test="preceding-sibling::X[1]"> </xsl:if> when I really intended <xsl:if test="preceding-sibling::*[1][self::X]"> </xsl:if>. This is probably because the latter is a less than intuitive way of saying "test if the immediately preceding sibling is an X."

It is, of course, impossible to show every useful permutation of path expressions using axes. But if you understand the building blocks presented previously you are well on your way to decoding the meaning of constructs such as preceding-sibling::X[1]/descendant::Z[A/B] or worse.

Table of Contents