Recipe 1.1. Effectively Using Axes
Problem
You need to select nodes in an XML
tree in ways that consider complex
relationships within the hierarchical structure.
Solution
Each of the following solutions is organized around related sets of
axes. For each group, a sample XML document is presented with the
context node in bold. An explanation of the effect of evaluating the
path is provided, along with an indication of the nodes that will be
selected with respect to the highlighted context. In some cases, the
solution will consider other nodes as the context to illustrate
subtleties of the particular path expression.
Child and descendant axes
The
child axis
is the default axis in XPath. This means one does not need to use the
child:: axis specification, but you can if you are
feeling pedantic. One can reach deeper into the XML tree using the
descendant:: and the
descendant-or-self:: axes. The former excludes the
context node and the latter includes it.
<Test id="descendants">
<parent>
<X id="1"/>
<X id="2"/>
<Y id="3">
<X id="3-1"/>
<Y id="3-2"/>
<X id="3-3"/>
</Y>
<X id="4"/>
<Y id="5"/>
<Z id="6"/>
<X id="7"/>
<X id="8"/>
<Y id="9"/>
</parent>
</Test>
(: Select all child elements named X :)
X (: same as child::X :)
Result: <X id="1"/> <X id="2"/> <X id="4"/> <X id="7"/><X id="8"/>
(:Select the first X child element:)
X[1]
Result: <X id="1"/>
(:Select the last X child element:)
X[last( )]
Result: <X id="8"/>
(:Select the first element, provided it is an X. Otherwise empty:)
*[1][self::X]
Result: <X id="1"/>
(:Select the last child, provided it is an X. Otherwise empty:)
*[last( )][self::X]
Result: ( )
*[last( )][self::Y]
Result: <Y id="9"/>
(: Select all descendants named X :)
descendant::X
Result: <X id="1"/> <X id="2"/> <X id="3-1"/> <X id="3-3"/> <X id="4"/> <X id="7"/> <X id="8"/>
(: Select the context node, if it is an X, and all descendants named X :)
descendant-or-self::X
Result: <X id="1"/> <X id="2"/> <X id="3-1"/> <X id="3-3"/> <X id="4"/> <X id="7"/> <X id="8"/>
(: Select the context node and all descendant elements :)
descendant-or-self::*
Result: <parent> <X id="1"/> <X id="2"/> <Y id="3"> <X id="3-1"/> <Y id="3-2"/> <X
id="3-3"/> </Y> <X id="4"/> <Y id="5"/> <Z id="6"/> <X id="7"/> <X id="8"/> <Y
id="9"/> </parent> <X id="1"/> <X id="2"/> <Y id="3"> <X id="3-1"/> <Y id="3-2"/> <X
id="3-3"/> </Y> <X id="3-1"/> <Y id="3-2"/> <X id="3-3"/> <X id="4"/> <Y id="5"/> <Z
id="6"/> <X id="7"/> <X id="8"/> <Y id="9"/>
Sibling axes
The sibling axes include preceding-sibling:: and
following-sibling::. As the names suggest, the
preceding-sibling axis consists of siblings that precede the context
node and the following-sibling axis consists of siblings that follow
it. Siblings are, of course, child nodes that
share the same parent. Most of the examples below use
preceding-sibling::, but you should be able to
work out the results for following-sibling::
without too much trouble.
Keep in mind that when using a positional path expression of the form
preceding-sibling::*[1], you are referring to the
immediately preceding sibling looking back from the context node and
not the first sibling in document order. Some people get confused
because the resulting sequence is in document order regardless as to
whether you use preceding-sibling:: or
following-sibling::. Although not an axis
expression per say, ../X is a way of saying,
select both preceding and following siblings named X as well as the
context node, should it be an X. More formally speaking, it is an
abbreviation for parent::node( )/X. Note that
(preceding-sibling::*)[1] and
(following-sibling::*)[1]
will select the first preceding/following
sibling in document order.
<!-- Sample document with context node highlighted -->
<Test id="preceding-siblings">
<A id="1"/>
<A id="2"/>
<B id="3"/>
<A id="4"/>
<B id="5"/>
<C id="6"/>
<A id="7"/>
<A id="8"/>
<B id="9"/>
</Test>
(:Select all A sibling elements that precede the context node. :)
preceding-sibling::A
Result: <A id="1"/> <A id="2"/> <A id="4"/>
(:Select all A sibling elements that follow the context node. :)
following-sibling::A
Result: <A id="8"/>
(:Select all sibling elements that precede the context node. :)
preceding-sibling::*
Result: <A id="1"/> <A id="2"/> <B id="3"/> <A id="4"/> <B id="5"/> <C id="6"/>
(: Select the first preceding sibling element named A in reverse document order. :)
preceding-sibling::A[1]
Result: <A id="4"/>
(: The first preceding element in reverse document order, provided it is an A. :)
preceding-sibling::*[1][self::A]
Result: ( )
(: If the context was <A id="8"/>, the result would be <A id="7"/> :)
(:All preceding sibling elements that are not A elements:)
preceding-sibling::*[not(self::A)]
Result <B id="3"/> <B id="5"/> <C id="6"/>
(: For the following recipes use this document. :)
<Test id="preceding-siblings">
<A id="1">
<A/>
</A>
<A id="2"/>
<B id="3">
<A/>
</B>
<A id="4"/>
<B id="5"/>
<C id="6"/>
<A id="7"/>
<A id="8"/>
<B id="9"/>
</Test>
(: The element directly preceding the context provided it has a child element A :)
preceding-sibling::*[1][A]
Result: ( )
The first element preceding the context that has a child A
preceding-sibling::*[A][1]
Result: <B id="3"> ...
(: XPath 2.0 allows more flexibility to select elements with respect to namespaces.
For these recipes the following XML document applies. :)
<Test xmlns:NS="http://www.ora.com/xstlcbk/1" xmlns:NS2="http://www.ora.com/xstlcbk/2">
<NS:A id="1"/>
<NS2:A id="2"/>
<NS:B id="3"/>
<NS2:B id="3"/>
</Test>
(: Select the preceding sibling elemements of the context whose namespace
is the namespace associated with prefix NS :)
preceding-sibling::NS:*
Result: <NS:A id="1"/>
(: Select the preceding sibling elemements of the context whose local name is A :)
preceding-sibling::*:A
Result: <NS:A id="1"/>, <NS2:A id="2"/>
Parent and ancestor axes
The parent axis (parent::) refers to the
parent of the
context node. The expression parent::X should not
be confused with ../X. The former will produce a
sequence of exactly one element provided the parent of the context is
X or empty otherwise. The latter is a shorthand for
parent::node( )/X, which will select all siblings
of the context node named X, including the context itself, should it
be an X.
One can navigate to higher levels of
the XML tree (parents, grandparents,
great-grandparents, and so on) using either
ancestor:: or
ancestor-or-self::. The former excludes the
context and the latter includes it.
(: Select the parent of the context node, provided it is an X element. Empty otherwise. :)
parent::X
(: Select the parent element of the context node. Can only be empty if the context
is the top-level element. :)
parent::*
(: Select the parent if it is in the namespace associated with the prefex NS.
The prefix must be defined; otherwise, it is an error. :)
parent::NS:*
(: Select the parent, regardless of its namespace, provided the local name is X. :)
parent::*:X
(: Select all ancestor elements (including the parent) named X. :)
ancestor::X
(: Select the context, provided it is an X, and all ancestor elements named X. :)
ancestor-or-self::X
Preceding and following axes
The preceding and following axes have the potential to select
a large number of nodes, because they consider all nodes that come
before (after) the context node in document order excluding ancestor
nodes. The following axis excludes descendants, and the preceding
axis excludes ancestors. Also don't forget: both
axes exclude namespace nodes and attributes.
(: All preceding element nodes named X. :)
preceding::X
(: The closest preceding element node named X. :)
preceding::X[1]
(: The furthest following element node named X. :)
following::X[last( )]
Discussion
XPath uses the notion of an axis to partition the document tree into
subsets relative to some node called the context node. In general,
these subsets overlap, but the ancestor, descendant, following,
preceding, and self axes partition a document (ignoring attribute and
namespace nodes): they do not overlap, and together they contain all
the nodes in the document. The context node is established by the
XPath hosting language. In XSLT, the context is set via:
Effectively wielding the kinds of path expression shown in the
solution is key to performing both simple and complex
transformations. Experience with traditional programming languages
sometimes leads to confusion and mistakes when using XPath. For
example, I often used to catch myself writing something like
<xsl:if
test="preceding-sibling::X[1]"> </xsl:if>
when I really intended <xsl:if
test="preceding-sibling::*[1][self::X]">
</xsl:if>. This is probably because the latter is a
less than intuitive way of saying "test if the
immediately preceding sibling is an X."
It is, of course, impossible to show every useful permutation
of path expressions using axes. But if
you understand the building blocks presented previously you are well
on your way to decoding the meaning of constructs such as
preceding-sibling::X[1]/descendant::Z[A/B] or
worse.
|