Previous Page Next Page

Recipe 8.8. Deepening an XML Hierarchy

Problem

You have a poorly designed document that can use extra structure.[5]

[5] It may be well-designed from a particular set of goals, but those goals aren't yours.

Solution

This is the opposite problem from that solved in Recipe 8.7. Here you need to add additional structure to a document, possibly to organize its elements by some additional criteria.

Add structure based on existing data

This type of deepening transformation example undoes the flattening transformation performed in Recipe 8.7:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>
   
  <xsl:template match="people">
    <union>
       <xsl:apply-templates select="person[@class = 'union']" />
    </union>
    <salaried>
       <xsl:apply-templates select="person[@class = 'salaried']" />
    </salaried>
  </xsl:template>  
   
</xsl:stylesheet>

Add structure to correct a poorly designed document

In a misguided effort to streamline XML, some people attempt to encode information by inserting sibling elements rather than parent elements.[6]

[6] To be fair, not every occurrence of this technique is misguided. Design is a navigation between competing tradeoffs.

For example, suppose someone distinguished between union and salaried employees in the following way:

<people>
  <class name="union"/>
  <person>
    <firstname>Warren</firstname>
    <lastname>Rosenbaum</lastname>
    <age>37</age>
    <height>5.75</height>
  </person>
...
  <person>
    <firstname>Theresa</firstname>
    <lastname>Archul</lastname>
    <age>37</age>
    <height>5.5</height>
  </person>
  <class name="salaried"/>
  <person>
    <firstname>Sal</firstname>
    <lastname>Mangano</lastname>
    <age>37</age>
    <height>5.75</height>
  </person>
...
  <person>
    <firstname>James</firstname>
    <lastname>O'Riely</lastname>
    <age>33</age>
    <height>5.5</height>
  </person>
</people>

Notice that the elements signifying union and salaried class elements are now empty. The intent is that all following-siblings of a class element belong to that class until another class element is encountered or there are no more siblings. This type of encoding is easy to grasp, but more difficult for an XSLT program to process. To correct this representation, you need to create a stylesheet that computes the set difference between all person elements following the first occurrence of a class element and the person elements following the next occurrence of a class element. XSLT 1.0 does not have an explicit set difference function. You can get essentially the same effect and be more efficient by considering all elements following a class element whose position is less than the position of elements following the next class element:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>
   
  <!-- The total number of people -->
  <xsl:variable name="num-people" select="count(/*/person)"/>     
  
  <xsl:template match="class">
    <!--The last position we want to consider. -->
    <xsl:variable name="pos" 
             select="$num-people - 
               count(following-sibling::class/following-sibling::person)"/>
    <xsl:element name="{@name}">
      <!-- Copy people that follow this class but whose position is 
           less than or equal to $pos.-->   
      <xsl:copy-of 
              select="following-sibling::person[position( ) &lt;= $pos]"/>
     </xsl:element> 
  </xsl:template>
   
<!-- Ignore person elements. They were coppied above. -->
<xsl:template match="person"/>
   
</xsl:stylesheet>

More subtly, a key can be used as follows:

<xsl:key name="people" match="person" 
         use="preceding-sibling::class[1]/@name" />
   
<xsl:template match="people">
  <people>
    <xsl:apply-templates select="class" />
  </people>
</xsl:template>
   
<xsl:template match="class">
  <xsl:element name="{@name}">
    <xsl:copy-of select="key('people', @name)" />
  </xsl:element>
</xsl:template>

A step-by-step approach is another alternative:

<xsl:template match="people">
  <people>
    <xsl:apply-templates select="class[1]" />
  </people>
</xsl:template>
   
<xsl:template match="class">
  <xsl:element name="{@name}">
    <xsl:apply-templates select="following-sibling::*[1][self::person]" />
  </xsl:element>
  <xsl:apply-templates select="following-sibling::class[1]" />
</xsl:template>
   
<xsl:template match="person">
  <xsl:copy-of select="." />
  <xsl:apply-templates select="following-sibling::*[1][self::person]" />
</xsl:template>

XSLT 2.0
Add structure based on existing data

Using XSLT 2.0's xsl:for-each-group allows you to achieve a more generic solution than we did in the 1.0 solution. Although there are 1.0 solutions that are generic (see Discussion), none is quite as simple:

<xsl:template match="people">
  <xsl:for-each-group select="person" 
                      group-by="preceding-sibling::class[1]/@name">
      <xsl:element name="{curent-grouping-key( )">
        <xsl:apply-templates select="current-group( )" />
      </xsl:element>
    </xsl:for-each>
</xsl:template>

Add structure to correct a poorly designed document

You can exploit xsl:for-each-group with the group-starting-with option to solve this problem:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:import href="copy.xslt"/>

    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    
    <xsl:template match="people">
       <xsl:copy>
         <xsl:for-each-group select="*" group-starting-with="class">
        <xsl:element name="{@name}">
           <xsl:apply-templates select="current-group( )[not(self::class)]"/>
        </xsl:element>
         </xsl:for-each-group>
       </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>

Discussion

Add structure based on existing data

When you added structure based on existing data, you explicitly referred to the criteria that formed the categories of interest (e.g., union and salaried). It would be better if the stylesheet figured out these categories by itself. This makes the stylesheet more generic at the cost of added complexity:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     
  <xsl:import href="copy.xslt"/>
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>
   
  <!-- build a unique list of all classes -->
  <xsl:variable name="classes" 
            select="/*/*/@class[not(. = ../preceding-sibling::*/@class)]"/>  
  <xsl:template match="/*">
    <!-- For each class create an element named after that 
         class that contains elements of that class -->
    <xsl:for-each select="$classes">
      <xsl:variable name="class-name" select="."/>
      <xsl:element name="{$class-name}">
        <xsl:for-each select="/*/*[@class=$class-name]">
          <xsl:copy>
            <xsl:apply-templates/>
          </xsl:copy>
        </xsl:for-each>
      </xsl:element>
   </xsl:for-each>
  </xsl:template>       
   
</xsl:stylesheet>

Although not 100% generic, this stylesheet avoids making assumptions about what kinds of classes exist in the document. The only application-specific information in this stylesheet is the fact that the categories are encoded in an attribute @class and that the attribute occurs in elements that are two levels down from the root.

Add structure to correct a poorly designed document

The solution can be implemented explicitly in terms of set difference. This solution is elegant, but impractical for large documents with many categories. The trick used here for computing set difference is explained in Recipe 9.1:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
  
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:strip-space elements="*"/>
     
  <xsl:template match="class">
    <!--All people following this class element -->
    <xsl:variable name="nodes1" select="following-sibling::person"/>
    <!--All people following the next class element -->
    <xsl:variable name="nodes2" 
          select="following-sibling::class/following-sibling::person"/>
    <xsl:element name="{@name}">
      <xsl:copy-of select="$nodes1[count(. | $nodes2) != count($nodes2)]"/>
     </xsl:element> 
  </xsl:template>
   
<xsl:template match="person"/>
   
</xsl:stylesheet>


Previous Page Next Page