Recipe 6.2. Prefer for-each-group over Muenchian Method of Grouping

Problem

XSLT 1.0 did not have explicit support for grouping so indirect and potentially confusing techniques had to be invented.

Solution

Take advantage of the powerful xsl:for-each-group instruction for all your grouping needs. This instruction has a mandatory select attribute where you provide an expression that defines the population of nodes you wish to group. You then use one of four grouping attributes to define the criteria for dividing the population into groups. These are explained next. As each group is processed, you can use the function current-group() to access all nodes in the current group. You use the function current-grouping-key() to access the value of the key that defines the group being processed when grouping by value or adjacent nodes. The current-grouping-key( ) function has no value when grouping by start or ending node.

You can also sort groups by inserting one or more xsl:sort instruction to define the sorting criteria just as you do when using xsl:for-each.

Group by values (group-by="expression")

A classic grouping problem arises quite often when processing data into reports. Consider sales data. Product managers will often want data grouped by sales region, product type, or salesperson, depending on what problem they are trying to solve. You use the group-by attribute to define an expression that determines that value or values that cause nodes in the population to group together. For example, group-by="@dept" would cause nodes that have the same dept value to group together:

<xsl:template match="Employees">
  <EmployeesByDept>
    <xsl:for-each-group select="employee" group-by="@dept">
      <dept name="{current-grouping-key( )}">
        <xsl:copy-of select="current-group( )"/>
      </dept>
    </xsl:for-each-group>
  </EmployeesByDept>
</xsl:template>

Group by adjacent nodes (group-adjacent="expression")

In some contexts, such as document processing, you want to consider nodes that share a common value provided they are also adjacent to each other. As with group-by, group-adjacent defines an expression used to determine the value used to perform the grouping, but two nodes that have such a value will only be in the same group if they are adjacent in the population. The value of group-adjacent must be singleton, as empty sequences or multi-valued sequences will cause an error.

Consider a document consisting of para elements interspersed with other heading elements. You would like to extract only the para elements without losing track of the fact that some sequences of para elements belong together as part of the same topic:

<xsl:template match="doc">
    <xsl:copy>
      <xsl:for-each-group select="*" group-adjacent="name( )">
        <xsl:if test="self::para">
          <topic>
            <xsl:copy-of select="current-group( )"/>
          </topic>
        </xsl:if>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

Group by starting node (group-starting-with="pattern")

Frequently, especially in document processing, a group of related nodes is demarcated by a particular node such as a title or subtitle, or other type of heading node. Grouping by starting node makes it easy to process these loosely structured documents. The group-starting-with attribute defines a pattern that matches nodes in the population that are the starting nodes of the group. This is similar to the patterns you use with the match attribute in xsl:template instructions. When the pattern matches a node in the population, all subsequent nodes are part of the group until another match is made. The first node in the population defines a group whether it matches or not. This implies that a population will have at least one group, the entire population, even if the pattern is never matched.

A classic example involves reconstituting structure from an unstructured document. XHTML is a good example of a loosely structured markup language, especially in regard to the use of heading elements (H1, H2, etc.). The following transformation will add some structure by nesting each group, designated by a starting h1 element, in a div element:

<xsl:template match="body">
  <xsl:copy>
    <xsl:for-each-group select="*" group-starting-with="h1">
      <div>
       <xsl:apply-templates select="current-group( )"/>
      </div>
   </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

Group by ending node (group-ending-with="pattern")

This form of grouping is similar to group-starting-with but uses the group-ending-with pattern to define the last node that will be in the current group. The first node in the population starts a new group, so there is always at least one group even if the pattern does not match any nodes.

Of all the grouping methods, grouping by ending node will typically find less application. This is because documents designed for human consumption use leading elements, such as headings, to single new groups, rather than trailing ones. In XSLT 2.0 Programmer's Reference, Michael Kay provides an example of a series of documents having been broken into chunks for purpose of transmission. In this example, the document boundaries are separated by the absence of an attribute continued='yes'. A slightly more probable example is one where you want to add structure to a flat document by chunking elements based on some criteria that designate the end of a chunk. For example, you can group paragraphs into sections of five paragraphs with the following code:

 <xsl:stylesheet version="2.0"
                 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
 
  <xsl:template match="doc">
    <xsl:copy>
      <xsl:for-each-group select="para" 
                          group-ending-with="para[position( ) mod 5 eq 0]">
        <section>
          <xsl:for-each select="current-group( )">
            <xsl:copy>
              <xsl:apply-templates select="@*|node( )"/>
            </xsl:copy>
          </xsl:for-each>
        </section>
      </xsl:for-each-group>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@* | node( )">
    <xsl:copy>
      <xsl:apply-templates select="@* | node( )"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Discussion

The Muenchian Method, named after Steve Muench of Oracle, was an innovative way to group data in XSLT 1.0. It took advantage of XSLT's ability to index documents using a key. The trick involves using the index to efficiently figure out the set of unique grouping keys and then using this set to process all nodes in the group:

<xsl:key name="products-by-category" select="product" use="@category"/>

<xsl:template match="/">

    <xsl:for-each select="//product[count(. | key('products-by-category', @category)[1]) = 1]">
    <xsl:variable name="current-grouping-key" 
                  select="@category"/>
    <xsl:variable name="current-group" 
                  select="key('current-grouping-key', 
                              $current-grouping-key)"/>
    <xsl:for-each select="$current-group/*">
       <!-- processing for elements in group -->
       <!-- you can use xsl:sort here also, if necessary -->
    </xsl:for-each/>
  </xsl:for-each/>

<xsl:template match="/">

Although the Muenchian method will continue to work in 2.0, you should prefer for-each-group because it is likely to be as efficient and probably more so. Just as important, it will make your code more comprehensible, especially to XSLT novices. Further, you use the same basic instruction to get access to the four distinct grouping capabilities. The Muenchian method can only be used for value-based grouping. Backward compatibility to XSLT 1.0 is probably the only compelling reason to continue to use Muenchian grouping in XSLT 2.0.

Table of Contents