Previous Page Next Page

Recipe 8.7. Flattening an XML Hierarchy

Problem

You have a document with elements organized in a more deeply nested fashion than you would prefer. You want to flatten the tree.

Solution

If your goal is simply to flatten without regard to the information encoded by the deeper structure, then you need to apply an overriding copy. The overriding template must match the elements you wish to discard and apply templates without copying.

Consider the following input, which segregates people into two categoriessalaried and union:

<people>
  <union>
    <person>
      <firstname>Warren</firstname>
      <lastname>Rosenbaum</lastname>
      <age>37</age>
      <height>5.75</height>
    </person>
    <person>
      <firstname>Dror</firstname>
      <lastname>Seagull</lastname>
      <age>28</age>
      <height>5.10</height>
    </person>
    <person>
      <firstname>Mike</firstname>
      <lastname>Heavyman</lastname>
      <age>45</age>
      <height>6.0</height>
    </person>
    <person>
      <firstname>Theresa</firstname>
      <lastname>Archul</lastname>
      <age>37</age>
      <height>5.5</height>
    </person>
  </union>
  <salaried>
    <person>
      <firstname>Sal</firstname>
      <lastname>Mangano</lastname>
      <age>37</age>
      <height>5.75</height>
    </person>
    <person>
      <firstname>Jane</firstname>
      <lastname>Smith</lastname>
      <age>28</age>
      <height>5.10</height>
    </person>
    <person>
      <firstname>Rick</firstname>
      <lastname>Winters</lastname>
      <age>45</age>
      <height>6.0</height>
    </person>
    <person>
      <firstname>James</firstname>
      <lastname>O'Riely</lastname>
      <age>33</age>
      <height>5.5</height>
    </person>
  </salaried>
</people>

This stylesheet simply discards the extra structure:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
   
  <xsl:output method="xml" version="1.0" encoding="UTF-8"/>
    
  <xsl:template match="people">
    <xsl:copy>
      <!--discard parents of person elements --> 
      <xsl:apply-templates select="*/person" />
    </xsl:copy>
  </xsl:template>
   
</xsl:stylesheet>

Discussion

Having additional structure in a document is generally good because it usually makes the document easier to process with XSLT. However, too much structure bloats the document and makes it harder for people to understand. Humans generally prefer to infer relationships by spatial text organization rather than with extra syntactic baggage.

The following example shows that the extra structure is not superfluous, but encodes additional information. If you want to retain information about the structure while flattening, then you should probably create an attribute or child element to capture the information.

This stylesheet creates an attribute:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
   
  <xsl:output method="xml" version="1.0" encoding="UTF-8" 
  omit-xml-declaration="yes"/>
      
  <!--discard parents of person elements --> 
  <xsl:template match="*[person]">
       <xsl:apply-templates/>
  </xsl:template>
   
<xsl:template match="person">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:attribute name="class">
      <xsl:value-of select="local-name(..)"/>
    </xsl:attribute>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

This variation creates an element:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   
  <xsl:import href="copy.xslt"/>
   
  <xsl:strip-space elements="*"/>
   
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
      
  <!--discard parents of person elements --> 
  <xsl:template match="*[person]">
       <xsl:apply-templates/>
  </xsl:template>
   
<xsl:template match="person">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:element name="class">
      <xsl:value-of select="local-name(..)"/>
    </xsl:element>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>
   
</xsl:stylesheet>

You can use xsl:strip-space and indent="yes" on the xsl:output element so the output will not contain a whitespace gap, as shown here:

<people>
...
    <person>
      <class>union</class>
      <firstname>Warren</firstname>
      <lastname>Rosenbaum</lastname>
      <age>37</age>
      <height>5.75</height>
    </person>
                                      <-- Whitespace gap here!
   
    <person>
      <class>salaried</class>
      <firstname>Sal</firstname>
      <lastname>Mangano</lastname>
      <age>37</age>
      <height>5.75</height>




    </person>
...
 </people>


Previous Page Next Page