Table of Contents |
Recipe 9.2. Performing Set Operations on Node Sets Using Value SemanticsProblemYou need to find the union , intersection, set difference, or symmetrical set difference between elements in two node sets; however, in your problem, equality is not defined as node-set identity. In other words, equality is a function of a node's value. SolutionXSLT 1.0The need for this solution may arise when working with multiple documents. Consider two documents with the same DTD but content that may not contain duplicate element values. XSLT elements coming from distinct documents are distinct even if they contain elements with the same namespace, attribute, and text values. See Example 9-1 to Example 9-4. Example 9-1. people1.xslt<people> <person name="Brad York" age="38" sex="m" smoker="yes"/> <person name="Charles Xavier" age="32" sex="m" smoker="no"/> <person name="David Williams" age="33" sex="m" smoker="no"/> </people> Example 9-2. people2.xslt<people> <person name="Al Zehtooney" age="33" sex="m" smoker="no"/> <person name="Brad York" age="38" sex="m" smoker="yes"/> <person name="Charles Xavier" age="32" sex="m" smoker="no"/> </people> Example 9-3. Failed attempt to use XSLT union to select unique people<xsl:template match="/"> <people> <xsl:copy-of select="//person | document('people2.xml')//person"/> </people> </xsl:template> Example 9-4. Output when run with people1.xml as input<people> <person name="Brad York" age="38" sex="m" smoker="yes"/> <person name="Charles Xavier" age="32" sex="m" smoker="no"/> <person name="David Williams" age="33" sex="m" smoker="no"/> <person name="Al Zehtooney" age="33" sex="m" smoker="no"/> <person name="Brad York" age="38" sex="m" smoker="yes"/> <person name="Charles Xavier" age="32" sex="m" smoker="no"/> </people> Relying on node identity can also break down in single document cases when you want equality of nodes to be a function of their text or attribute values. The following stylesheet provides a reusable implementation of union, intersection, and set difference based on value semantics. The idea is that a stylesheet importing this one will override the template whose mode="vset:element-equality". This allows the importing stylesheet to define whatever equality semantics make sense for the given input: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:vset="http:/www.ora.com/XSLTCookbook/namespaces/vset"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <!-- The default implementation of element equality. Override in the importing stylesheet as necessary. --> <xsl:template match="node( ) | @*" mode="vset:element-equality"> <xsl:param name="other"/> <xsl:if test=". = $other"> <xsl:value-of select="true( )"/> </xsl:if> </xsl:template> <!-- The default set membership test uses element equality. You will rarely need to override this in the importing stylesheet. --> <xsl:template match="node( ) | @*" mode="vset:member-of"> <xsl:param name="elem"/> <xsl:variable name="member-of"> <xsl:for-each select="."> <xsl:apply-templates select="." mode="vset:element-equality"> <xsl:with-param name="other" select="$elem"/> </xsl:apply-templates> </xsl:for-each> </xsl:variable> <xsl:value-of select="string($member-of)"/> </xsl:template> <!-- Compute the union of two sets using "by value" equality. --> <xsl:template name="vset:union"> <xsl:param name="nodes1" select="/.." /> <xsl:param name="nodes2" select="/.." /> <!-- for internal use --> <xsl:param name="nodes" select="$nodes1 | $nodes2" /> <xsl:param name="union" select="/.." /> <xsl:choose> <xsl:when test="$nodes"> <xsl:variable name="test"> <xsl:apply-templates select="$union" mode="vset:member-of"> <xsl:with-param name="elem" select="$nodes[1]" /> </xsl:apply-templates> </xsl:variable> <xsl:call-template name="vset:union"> <xsl:with-param name="nodes" select="$nodes[position( ) > 1]" /> <xsl:with-param name="union" select="$union | $nodes[1][not(string($test))]" /> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:apply-templates select="$union" mode="vset:union" /> </xsl:otherwise> </xsl:choose> </xsl:template> <!-- Return a copy of union by default. Override in importing stylesheet to receive reults as a "callback"--> <xsl:template match="/ | node( ) | @*" mode="vset:union"> <xsl:copy-of select="."/> </xsl:template> <!-- Compute the intersection of two sets using "by value" equality. --> <xsl:template name="vset:intersection"> <xsl:param name="nodes1" select="/.."/> <xsl:param name="nodes2" select="/.."/> <!-- For internal use --> <xsl:param name="intersect" select="/.."/> <xsl:choose> <xsl:when test="not($nodes1)"> <xsl:apply-templates select="$intersect" mode="vset:intersection"/> </xsl:when> <xsl:when test="not($nodes2)"> <xsl:apply-templates select="$intersect" mode="vset:intersection"/> </xsl:when> <xsl:otherwise> <xsl:variable name="test1"> <xsl:apply-templates select="$nodes2" mode="vset:member-of"> <xsl:with-param name="elem" select="$nodes1[1]"/> </xsl:apply-templates> </xsl:variable> <xsl:variable name="test2"> <xsl:apply-templates select="$intersect" mode="vset:member-of"> <xsl:with-param name="elem" select="$nodes1[1]"/> </xsl:apply-templates> </xsl:variable> <xsl:choose> <xsl:when test="string($test1) and not(string($test2))"> <xsl:call-template name="vset:intersection"> <xsl:with-param name="nodes1" select="$nodes1[position( ) > 1]"/> <xsl:with-param name="nodes2" select="$nodes2"/> <xsl:with-param name="intersect" select="$intersect | $nodes1[1]"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="vset:intersection"> <xsl:with-param name="nodes1" select="$nodes1[position( ) > 1]"/> <xsl:with-param name="nodes2" select="$nodes2"/> <xsl:with-param name="intersect" select="$intersect"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template> <!-- Return a copy of intersection by default. Override in importing stylesheet to receive results as a "callback"--> <xsl:template match="/ | node( ) | @*" mode="vset:intersection"> <xsl:copy-of select="."/> </xsl:template> <!-- Compute the differnce between two sets (node1 - nodes2) using "by value" equality. --> <xsl:template name="vset:difference"> <xsl:param name="nodes1" select="/.."/> <xsl:param name="nodes2" select="/.."/> <!-- For internal use --> <xsl:param name="difference" select="/.."/> <xsl:choose> <xsl:when test="not($nodes1)"> <xsl:apply-templates select="$difference" mode="vset:difference"/> </xsl:when> <xsl:when test="not($nodes2)"> <xsl:apply-templates select="$nodes1" mode="vset:difference"/> </xsl:when> <xsl:otherwise> <xsl:variable name="test1"> <xsl:apply-templates select="$nodes2" mode="vset:member-of"> <xsl:with-param name="elem" select="$nodes1[1]"/> </xsl:apply-templates> </xsl:variable> <xsl:variable name="test2"> <xsl:apply-templates select="$difference" mode="vset:member-of"> <xsl:with-param name="elem" select="$nodes1[1]"/> </xsl:apply-templates> </xsl:variable> <xsl:choose> <xsl:when test="string($test1) or string($test2)"> <xsl:call-template name="vset:difference"> <xsl:with-param name="nodes1" select="$nodes1[position( ) > 1]"/> <xsl:with-param name="nodes2" select="$nodes2"/> <xsl:with-param name="difference" select="$difference"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:call-template name="vset:difference"> <xsl:with-param name="nodes1" select="$nodes1[position( ) > 1]"/> <xsl:with-param name="nodes2" select="$nodes2"/> <xsl:with-param name="difference" select="$difference | $nodes1[1]"/> </xsl:call-template> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template> <!-- Return a copy of difference by default. Override in importing stylesheet to receive results as a "callback"--> <xsl:template match="/ | node( ) | @*" mode="vset:difference"> <xsl:copy-of select="."/> </xsl:template> These recursive templates are implemented in terms of the following definitions:
In all cases, membership defaults to equality of string values, but the importing stylesheet can override this default. Given these value-oriented set operations, you can achieve the desired effect on people1.xml and people2.xml using the following stylesheet: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:vset="http:/www.ora.com/XSLTCookbook/namespaces/vset"> <xsl:import href="set.ops.xslt"/> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="/"> <people> <xsl:call-template name="vset:union"> <xsl:with-param name="nodes1" select="//person"/> <xsl:with-param name="nodes2" select="document('people2.xml')//person"/> </xsl:call-template> </people> </xsl:template> <!--Define person equality as having the same name --> <xsl:template match="person" mode="vset:element-equality"> <xsl:param name="other"/> <xsl:if test="@name = $other/@name"> <xsl:value-of select="true( )"/> </xsl:if> </xsl:template> </xsl:stylesheet> XSLT 2.0The main enhancement of XSLT 2.0 is to take advantage of the availability of first-class functions and sequences. This eliminates the need for recursion, the call-back trick, and leads to cleaner definitions and usage. The functions vset:element-equality and vset:member-of can still be overridden in importing stylesheets to customize behavior. <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:vset="http:/www.ora.com/XSLTCookbook/namespaces/vset"> <!-- The default implementation of element equality. Override in the importing stylesheet as necessary. --> <xsl:function name="vset:element-equality" as="xs:boolean"> <xsl:param name="item1" as="item( )?"/> <xsl:param name="item2" as="item( )?"/> <xsl:sequence select="$item1 = $item2"/> </xsl:function> <!-- The default set membership test uses element equality. You will rarely need to override this in the importing stylesheet. --> <xsl:function name="vset:member-of" as="xs:boolean"> <xsl:param name="set" as="item( )*"/> <xsl:param name="elem" as="item( )"/> <xsl:variable name="member-of" as="xs:boolean*" select="for $test in $set return if (vset:element-equality($test, $elem)) then true( ) else ( )"/> <xsl:sequence select="not(empty($member-of))"/> </xsl:function> <!-- Compute the union of two sets using "by value" equality. --> <xsl:function name="vset:union" as="item( )*"> <xsl:param name="nodes1" as="item( )*" /> <xsl:param name="nodes2" as="item( )*" /> <xsl:sequence select="$nodes1, for $test in $nodes2 return if (vset:member-of($nodes1,$test)) then ( ) else $test"/> </xsl:function> <!-- Compute the intersection of two sets using "by value" equality. --> <xsl:function name="vset:intersection" as="item( )*"> <xsl:param name="nodes1" as="item( )*" /> <xsl:param name="nodes2" as="item( )*" /> <xsl:sequence select="for $test in $nodes1 return if (vset:member-of($nodes2,$test)) then $test else ( )"/> </xsl:function> <!-- Compute the difference between two sets (node1 - nodes2) using "by value" equality. --> <xsl:function name="vset:difference" as="item( )*"> <xsl:param name="nodes1" as="item( )*" /> <xsl:param name="nodes2" as="item( )*" /> <xsl:sequence select="for $test in $nodes1 return if (vset:member-of($nodes2, $test)) then ( ) else $test"/> </xsl:function> </xsl:stylesheet> DiscussionYou might think that equality is a cut-and-dried issue; two things are either equal or they're not. However, in programming (as in politics), equality is in the eye of the beholder. In a typical document, an element is associated with a uniquely identifiable object. For example, a paragraph element, <p>...</p>, is distinct from another paragraph element somewhere else in the document, even if they have the same content. Hence, set operations based on the unique identity of elements are the norm. However, when considering XSLT operations crossing multiple documents or acting on elements that result from applying xsl:copy, we need to carefully consider what we want equality to be. Here are some query examples in which value set semantics are required:
|
Table of Contents |