Recipe 6.2. Prefer for-each-group over Muenchian Method of Grouping
Problem
XSLT 1.0 did not have
explicit
support for grouping so indirect and potentially confusing techniques
had to be invented.
Solution
Take advantage of the powerful
xsl:for-each-group
instruction for all your grouping
needs. This instruction has a mandatory select
attribute where you provide an expression that defines the population
of nodes you wish to group. You then use one of four grouping
attributes to define the criteria for dividing the population into
groups. These are explained next. As each group is processed, you can
use the function current-group()
to access all nodes in the current group.
You use the function current-grouping-key()
to access the value of the key that defines
the group being processed when grouping by value or adjacent nodes.
The current-grouping-key( ) function has no value
when grouping by start or ending node.
You can also sort groups by inserting one or more
xsl:sort instruction to define the sorting
criteria just as you do when using xsl:for-each.
Group by values (group-by="expression")
A classic grouping
problem
arises quite often when processing data
into reports. Consider sales data. Product managers will often want
data grouped by sales region, product type, or salesperson, depending
on what problem they are trying to solve. You use the
group-by
attribute to
define an expression that determines that value or values that cause
nodes in the population to group together. For example,
group-by="@dept" would cause
nodes that have the same dept value to group together:
<xsl:template match="Employees">
<EmployeesByDept>
<xsl:for-each-group select="employee" group-by="@dept">
<dept name="{current-grouping-key( )}">
<xsl:copy-of select="current-group( )"/>
</dept>
</xsl:for-each-group>
</EmployeesByDept>
</xsl:template>
Group by adjacent nodes (group-adjacent="expression")
In some contexts, such
as document processing, you want to
consider nodes that share a common value provided they are also
adjacent to each other. As with group-by,
group-adjacent defines an expression used to
determine the value used to perform the grouping, but two nodes that
have such a value will only be in the same group if they are adjacent
in the population. The value of group-adjacent must be singleton, as
empty sequences or multi-valued sequences will cause an error.
Consider a document consisting of para elements
interspersed with other heading elements. You would like to extract
only the para elements without losing track of the
fact that some sequences of para elements belong
together as part of the same topic:
<xsl:template match="doc">
<xsl:copy>
<xsl:for-each-group select="*" group-adjacent="name( )">
<xsl:if test="self::para">
<topic>
<xsl:copy-of select="current-group( )"/>
</topic>
</xsl:if>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
Group by starting node (group-starting-with="pattern")
Frequently, especially
in
document processing, a group of related
nodes is demarcated by a particular node such as a title or subtitle,
or other type of heading node. Grouping by starting node makes it
easy to process these loosely structured documents. The
group-starting-with
attribute defines a pattern that matches nodes in the population that
are the starting nodes of the group. This is similar to the patterns
you use with the match attribute in
xsl:template instructions. When the pattern
matches a node in the population, all subsequent nodes are part of
the group until another match is made. The first node in the
population defines a group whether it matches or not. This implies
that a population will have at least one group, the entire
population, even if the pattern is never matched.
A classic example involves reconstituting structure from an
unstructured document. XHTML is a good example of a loosely
structured markup language, especially in regard to the use of
heading elements (H1, H2,
etc.). The following transformation will add some structure by
nesting each group, designated by a starting h1
element, in a div element:
<xsl:template match="body">
<xsl:copy>
<xsl:for-each-group select="*" group-starting-with="h1">
<div>
<xsl:apply-templates select="current-group( )"/>
</div>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
Group by ending node (group-ending-with="pattern")
This form of grouping
is similar to
group-starting-with but uses the
group-ending-with pattern to define the last node
that will be in the current group. The first node in the population
starts a new group, so there is always at least one group even if the
pattern does not match any nodes.
Of all the grouping methods, grouping by ending node will typically
find less application. This is because documents designed for human
consumption use leading elements, such as headings, to single new
groups, rather than trailing ones. In XSLT 2.0
Programmer's Reference, Michael Kay
provides an example of a series of documents having been broken into
chunks for purpose of transmission. In this example, the document
boundaries are separated by the absence of an attribute
continued='yes'. A slightly more probable example
is one where you want to add structure to a flat document by chunking
elements based on some criteria that designate the end of a chunk.
For example, you can group paragraphs into sections of five
paragraphs with the following code:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="doc">
<xsl:copy>
<xsl:for-each-group select="para"
group-ending-with="para[position( ) mod 5 eq 0]">
<section>
<xsl:for-each select="current-group( )">
<xsl:copy>
<xsl:apply-templates select="@*|node( )"/>
</xsl:copy>
</xsl:for-each>
</section>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<xsl:template match="@* | node( )">
<xsl:copy>
<xsl:apply-templates select="@* | node( )"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Discussion
The
Muenchian Method, named after Steve
Muench of Oracle, was an innovative way to group data in XSLT 1.0. It
took advantage of XSLT's ability to index documents
using a key. The trick involves using the index to efficiently figure
out the set of unique grouping keys and then using this set to
process all nodes in the group:
<xsl:key name="products-by-category" select="product" use="@category"/>
<xsl:template match="/">
<xsl:for-each select="//product[count(. | key('products-by-category', @category)[1]) = 1]">
<xsl:variable name="current-grouping-key"
select="@category"/>
<xsl:variable name="current-group"
select="key('current-grouping-key',
$current-grouping-key)"/>
<xsl:for-each select="$current-group/*">
<!-- processing for elements in group -->
<!-- you can use xsl:sort here also, if necessary -->
</xsl:for-each/>
</xsl:for-each/>
<xsl:template match="/">
Although the Muenchian method will continue to work in 2.0, you
should prefer for-each-group because it is likely
to be as efficient and probably more so. Just as important, it will
make your code more comprehensible, especially to XSLT novices.
Further, you use the same basic instruction to get access to the four
distinct grouping capabilities. The Muenchian method can only be used
for value-based grouping. Backward compatibility to XSLT 1.0 is
probably the only compelling reason to continue to use Muenchian
grouping in XSLT 2.0.
|