Recipe 1.2. Filtering Nodes

Problem

You need to select nodes based on the data they contain instead or in addition to their names or position.

Solution

Many of the mini-recipes in Recipe 1.1 used predicates to filter nodes, but those predicates were based strictly on position of the node or node name. Here we consider a variety of predicates that filter based on data content. In these examples, we use a simple child element path X before each predicate, but one could equally substitute any path expression for X, including those in Recipe 1.1.

In the following examples, we use the XPath 2.0 comparison operators (eq, ne, lt, le, gt, and ge) instead of the operators (=, !=, <, <=, >, and >=). This is because when one is comparing atomic values, the new operators are preferred. In XPath 1.0, you only have the latter operators so make the appropriate substitution. The new operators were introduced in XPath 2.0 because they have simpler semantics and will probably be more efficient as a result. The complexity of the old operators comes when one considers cases where a sequence is on either side of the comparison. Recipe 1.8 covers this topic further.

Another point must be made for those working in XPath 2.0 because that version incorporates type information when a schema is available. That could lead to some of the expressions below to have type errors. For example, X[@a = 10] is not the same as X[@a = '10'] when the attribute a has an integer type. Here we assume there is no schema and therefore all atomic values have the type untypedAtomic. You can find more on this topic in Recipes Recipe 1.9 and Recipe 1.10.

(: Select X child elements that have an attribute named a. :)
X[@a]

(: Select X children that have at least one attribute. :)
X[@*]

(: Select X children that have at least three attributes. :)
X[count(@*) > 2]

(: Select X children whose attributes sum to a value less than 7. :)
X[sum(foreach $a in @* return number($a)) < 7] (: In XSLT 1.0 use sum(@*) &lt; 7 :)

(: Select X children that have no attributes named a. :)
X[not(@a)] 

(: Select X children that have no attributes. :)
X[not(@*)] 

(: Select X children that have an attribute named a with value '10'. :)
X[@a eq '10'] 

(: Select X children that have a child named Z with value '10'. :)
X[Z eq '10'] 

(: Select X children that have a child named Z with value not equal to '10'. :)
X[Z ne '10'] 

(: Select X children if they have at least one child text node. :)
X[text( )] 

(: Select X children if they have a text node with at least one non-whitespace 
character. :)
X[text( )[normalize-space(.)]] 

(: Select X children if they have any child node. :)
X[node( )] 

(: Select X children if they contain a comment node. :)
X[comment( )] 

(: Select X children if they have an @a whose numerical value is less than 10. 
This expression will work equally well in XPath 1.0 and 2.0 regardles of whether 
@a is a string or a numeric type. :)

X[number(@a) < 10] 

(: Select X if it has at least one preceding sibling named Z with an attribute y 
that is not equal to 10. :)

X[preceding-sibling::Z/@y ne '10'] 

(: Select X children whose string-value consist of a single space character. :)
X[. = ' ']

(: An odd way of getting an empty sequence! :)
X[false( )]

(: Same as X. :)
X[true( )]

(: X elements with exactly 5 children elements. :)
X[count(*) eq 5]

(: X elements with exactly 5 children nodes (including element, text, comment, 
and PI nodes but not attribute nodes). :)
X[count(node( )) eq 5]

(: X elements with exactly 5 nodes of any kind. :)
X[count(@* | node( )) eq 5]

(: The first X child, provided it has the value 'some text'; empty otherwise. :)
X[1][. eq 'some text']

(: Select all X children with the value 'some text' and return the first or empty 
if there is no such child. In simpler words, the first X child element that has the 
string-value 'some text'. :)
X[. eq 'some text'][1]

Discussion

As with Recipe 1.1, it is impossible to completely cover every interesting permutation of filtering predicates. However, mastering the themes exemplified above should help you develop almost any filtering expression you desire. Also consider that one can create more complex conditions using the logical operators and, or and the function not( ).

number(@a) > 5 and X[number(@a) < 10]

When using predicates with complex path expressions, you need to understand the effect of parenthesis.

(: Select the first Y child of every X child of the context node. This expression 
can result in a sequence of more than one Y. :)
X/Y[1]

(: Select the sequence of nodes X/Y and then take the first. This expression can 
at most select one Y. :)

(X/Y)[1]

A computer scientist would say that the conditional operator [] binds more tightly than the path operator /.

Table of Contents