Recipe 8.7. Flattening an XML Hierarchy
Problem
You have a document with
elements
organized in a more deeply nested fashion than you would prefer. You
want to flatten the tree.
Solution
If your goal is simply to flatten without regard to the information
encoded by the deeper structure, then you need to apply an overriding
copy. The overriding template must match the elements you wish to
discard and apply templates without copying.
Consider the following input, which segregates people into two
categoriessalaried and union:
<people>
<union>
<person>
<firstname>Warren</firstname>
<lastname>Rosenbaum</lastname>
<age>37</age>
<height>5.75</height>
</person>
<person>
<firstname>Dror</firstname>
<lastname>Seagull</lastname>
<age>28</age>
<height>5.10</height>
</person>
<person>
<firstname>Mike</firstname>
<lastname>Heavyman</lastname>
<age>45</age>
<height>6.0</height>
</person>
<person>
<firstname>Theresa</firstname>
<lastname>Archul</lastname>
<age>37</age>
<height>5.5</height>
</person>
</union>
<salaried>
<person>
<firstname>Sal</firstname>
<lastname>Mangano</lastname>
<age>37</age>
<height>5.75</height>
</person>
<person>
<firstname>Jane</firstname>
<lastname>Smith</lastname>
<age>28</age>
<height>5.10</height>
</person>
<person>
<firstname>Rick</firstname>
<lastname>Winters</lastname>
<age>45</age>
<height>6.0</height>
</person>
<person>
<firstname>James</firstname>
<lastname>O'Riely</lastname>
<age>33</age>
<height>5.5</height>
</person>
</salaried>
</people>
This stylesheet simply discards the extra structure:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="copy.xslt"/>
<xsl:output method="xml" version="1.0" encoding="UTF-8"/>
<xsl:template match="people">
<xsl:copy>
<!--discard parents of person elements -->
<xsl:apply-templates select="*/person" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Discussion
Having additional structure in a document is generally good because
it usually makes the document easier to process with XSLT. However,
too much structure bloats the document and makes it harder for people
to understand. Humans generally prefer to infer relationships by
spatial text organization rather than with extra syntactic baggage.
The following example shows that the extra structure is not
superfluous, but encodes additional information. If you want to
retain information about the structure while flattening, then you
should probably create an attribute or child element to capture the
information.
This stylesheet creates an attribute:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="copy.xslt"/>
<xsl:output method="xml" version="1.0" encoding="UTF-8"
omit-xml-declaration="yes"/>
<!--discard parents of person elements -->
<xsl:template match="*[person]">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="person">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:attribute name="class">
<xsl:value-of select="local-name(..)"/>
</xsl:attribute>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This variation creates an element:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:import href="copy.xslt"/>
<xsl:strip-space elements="*"/>
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<!--discard parents of person elements -->
<xsl:template match="*[person]">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="person">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:element name="class">
<xsl:value-of select="local-name(..)"/>
</xsl:element>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
You can use xsl:strip-space and
indent="yes" on the xsl:output
element so the output will not contain a whitespace gap, as shown
here:
<people>
...
<person>
<class>union</class>
<firstname>Warren</firstname>
<lastname>Rosenbaum</lastname>
<age>37</age>
<height>5.75</height>
</person>
<-- Whitespace gap here!
<person>
<class>salaried</class>
<firstname>Sal</firstname>
<lastname>Mangano</lastname>
<age>37</age>
<height>5.75</height>
</person>
...
</people>
|