Recipe 14.12. Using Saxon's and Xalan's Native Extensions

Problem

You want to know how to exploit some of the useful extensions available in these popular XSLT implementations.

Solution

XSLT 1.0

This recipe is broken into a bunch of mini-recipes showcasing the most important Saxon and Xalan extensions. For all examples, the saxon namespace prefix is associated with http://icl.com/saxon, and the xalan namespace prefix is associated with http://xml.apache.org/xslt.

You want to output to more than one destination

This book has used Saxon's facility several times to output results to more than one file. Saxon uses the saxon:output element. It also provides the xsl:document element, but it will only work if the stylesheet version attribute is 1.1 and is therefore not preferred. The HRef attribute specifies the output destination. This attribute can be an attribute value template:

<saxon:output href="toc.html">
  <html>
    <head><title>Table of Contents</title></head>
    <body>
      <xsl:apply-templates mode="toc" select="*"/>
    </body>
  </html>
</saxon:output>

Xalan takes a significantly different approach to multidestination output. Rather than one instruction, Xalan gives you three: redirect:open, redirect:close, and redirect:write. The extension namespace associated with these elements is xmlns:redirect = "org.apache. xalan.xslt.extensions.Redirect". For the most common cases, you can get away with using redirect:write by itself because if used alone, it will open, write, and close the file.

Each element includes a file attribute and/or a select attribute to designate the output file. The file attribute takes a string, so you can use it to specify the output filename directly. The select attribute takes an XPath expression, so you can use it to generate the output file name dynamically. If you include both attributes, the redirect extension first evaluates the select attribute and falls back to the file attribute if the select attribute expression does not return a valid filename:

<xalan:write file="toc.html">
  <html>
    <head><title>Table of Contents</title></head>
    <body>
      <xsl:apply-templates mode="toc" select="*"/>
    </body>
  </html>
</xalan:write>

By using Xalan's extended capabilities, you can switch from writing a primary output file to other secondary files while the primary remains open. This step undermines the no-side-effects nature of XSLT, but presumably, Xalan will ensure a predictable operation:

<xsl:template match="doc">
<xalan:open file="regular.xml"/>
     <xsl:apply-templates select="*"/>
<xalan:close file="regular.xml"/>
<xsl:template/>
   
<xsl:template match="regular">
  <xalan:write file="regular.xml">
     <xsl:copy-of select="."/>
  </xalan:write/>
</xsl:template>
   
<xsl:template match="*">
  <xsl:variable name="file" select="concat(local-name( ),'.xml')"/>
  <xalan:write select="$file">
     <xsl:copy-of select="."/>
  </xalan:write/>
</xsl:template>

XSLT 2.0 provides native support for multiple result destinations via a new element called xsl:result-document:

 <xsl:result-document format="html" href="toc.html">
  <html>
    <head><title>Table of Contents</title></head>
    <body>
      <xsl:apply-templates mode="toc" select="*"/>
    </body>
  </html>
</xsl:result-document>

You want to split a complex transformation into a series of transformations in a pipeline

Developers who have worked a lot with Unix are intimately familiar with the notion of a processing pipeline in which the output of a command is fed into the input of another. This facility is also available in other operating systems, such as Windows. The genius of the pipelining approach to software development is that it enables the assembly of complex tasks from more basic commands.

Since an XSLT transformation is ultimately a tree-to-tree transformation, applying the pipelining approach is natural. Here the result tree of one transform becomes the input tree of the next. You have seen numerous examples in which the node-set extension function can create intermediate results that can be processed by subsequent stages. Alternatively, Saxon provides this functionality via the saxon:next-in-chain extension attribute of xsl:output. The saxon:next-in-chain attribute directs the output to another stylesheet. The value is the URL of a stylesheet that should be used to process the output stream. The output stream must always be pure XML, and attributes that control the output's format (e.g., method, cdata-section-elements, etc.) have no effect. The second stylesheet's output is directed to the destination that would have been used for the first stylesheet if no saxon:next-in-chain attribute were present.

Xalan has a different approach to this functionality; it uses a pipeDocument extension element. The nice thing about pipeDocument is that you can use it in an otherwise empty stylesheet to create a pipeline between independent stylesheets that do not know they are used in this way. The Xalan implementation is therefore much more like the Unix pipe because the pipeline is not hardcoded into the participating stylesheets. Imagine that a stylesheet called strip.xslt stripped out specific elements from an XML document representing a book, and a stylesheet called contents.xslt created a table of contents based on the hierarchical structure of the document's markup. You could create a pipeline between the stylesheets as follows:

<xsl:stylesheet version="1.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:pipe="xalan://PipeDocument"
        extension-element-prefixes="pipe">
 
 <xsl:param name="source"/>
 <xsl:param name="target"/>
 <!-- A list of elements to preserve. All others are stripped. -->
 <xsl:param name="preserve-elems"/>
 
 <pipe:pipeDocument source="{$source}" target="{$target}">
   
   <stylesheet href="strip.xslt">
     <param name="preserve-elems" value="{$preserve-elems}"/>
   </stylesheet>
   
   <stylesheet href="contents.xslt"/>
   
 </pipe:pipeDocument>
 
</xsl:stylesheet>

This code would create a table of contents based on the specified elements without disabling the independent use of strip.xsl or contents.xsl.

You want to work with dates and times

Chapter 4 provided a host of recipes dealing with dates and times but no pure XSLT facility that could determine the current date and time. Both Saxon and Xalan implement core functions from the EXSLT dates and times module. This section includes EXSLT's date-and-time documentation for easy reference. The functions are shown in Table 14-1 with their return type, followed by the function and arguments. A question mark (?) indicates optional arguments.

Table 14-1. EXSLT's date-and-time functions

Function

Behavior

string date: date-time( )

The date:date-time function returns the current date and time as a date/time string. The returned date/time string must be in the format XML schema defines as the lexical representation of xs:dateTime.

string date: date(string?)

The date:date function returns the date specified in the date/time string given as the argument. If no argument is given, the current local date/time, as returned by date:date-time, is used as a default argument.

string date: time(string?)

The date:time function returns the time specified in the date/time string given as the argument. If no argument is given, the current local date/time, as returned by date:date-time, is used as a default argument.

number date: year(string?)

The date:year function returns the date's year as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as a default argument.

boolean date: leap-year(string?)

The date:leap-year function returns true if the year given in a date is a leap year. If no argument is given, then the current local date/time, as returned by date:date-time, is used as a default argument.

number date: month-in-year(string?)

The date:month-in-year function returns the month of a date as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument

string date: month-name(string?)

The date:month-name function returns the full name of the month of a date. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

string date: month-abbreviation(string?)

The date:month-abbreviation function returns the abbreviation of the month of a date. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: week-in-year(string?)

The date:week-in-year function returns the week of the year as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument. Counting follows ISO 8601: Week 1 in a year is the week containing the first Thursday of the year, with new weeks beginning on Mondays.

number date: day-in-year(string?)

The date:day-in-year function returns the day of a date in a year as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: day-in-month(string?)

The date:day-in-month function returns the day of a date as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: day-of-week-in-month(string?)

The date:day-of-week-in-month function returns the day of the week in a month as a number (e.g., 3 for the third Tuesday in May). If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: day-in-week(string?)

The date:day-in-week function returns the day of the week given in a date as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

string date: day-name(string?)

The date:day-name function returns the full name of the day of the week of a date. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

string date: day-abbreviation(string?)

The date:day-abbreviation function returns the abbreviation of the day of the week of a date. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: hour-in-day(string?)

The date:hour-in-day function returns the hour of the day as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: minute-in-hour(string?)

The date:minute-in-hour function returns the minute of the hour as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

number date: second-in-minute(string?)

The date:second-in-minute function returns the second of the minute as a number. If no argument is given, then the current local date/time, as returned by date:date-time, is used as the default argument.

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:date="http://exslt.org/dates-and-times">
   
<xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:template match="/">
  <html>
    <head><title>My Dull Home Page</title></head>
    <body>
      <h1>My Dull Homepage</h1>
      <div>It's <xsl:value-of select="date:time( )"/> on <xsl:value-of 
      select="date:date(  )"/> and this page is as dull as it was yesterday.</div>
    </body>
  </html>
   
</xsl:template>
     
</xsl:stylesheet>

XSLT 2.0 has direct support for dates and times, as discussed in Chapter 4, so these extensions are not necessary.

You need a more efficient implementation of set operations

Chapter 9 investigated various means of implementing set operations other than set union, which XPath supplies natively via the union operator (|). These solutions were not necessarily the most efficient or obvious.

Both Saxon and Xalan remedy this problem by implementing the set operations defined by EXSLT's set module (see Table 14-2).

Table 14-2. EXSLT's set module's set operations

Function

Behavior

Node-set set: difference(node-set, node-set)

The set:difference function returns the difference between two node setsnodes that are in the node set passed as the first argument that are not in the node set passed as the second argument.

Node-set set: intersection(node-set, node-set)

The set:intersection function returns a node set comprising the nodes that are within both the node sets passed to it as arguments.

Node-set set: distinct(node-set)

The set:distinct function returns a subset of the nodes contained in the node set NS passed as the first argument. Specifically, it selects a node N if no node in NS has the same string value as N, and that precedes N in document order.

Boolean set: has-same-node(node-set, node-set)

The set:has-same-node function returns true if the node set passed as the first argument shares nodes with the node set passed as the second argument. If no nodes are in both node sets, it returns false.

Node-set set: leading(node-set, node-set)

The set:leading function returns the nodes in the node set passed as the first argument that precede, in document order, the first node in the node set passed as the second argument. If the first node in the second node set is not contained in the first node set, then an empty node set is returned. If the second node set is empty, then the first node set is returned.

Node-set set: trailing(node-set, node-set)

The set:trailing function returns the nodes in the node set passed as the first argument that follow, in document order, to the first node in the node set passed as the second argument. If the first node in the second node set is not contained in the first node set, then an empty node set is returned. If the second node set is empty, then the first node set is returned.

set:distinct is a convenient way to remove duplicates, as long as equality is defined as string-value equality:

<xsl:varaible name="firstNames" select="set:destinct(person/firstname)"/>

set:leading and set:trailing can extract nodes bracketed by other nodes. For example, Recipe 12.9 used a complex expression to locate the xslx:elsif and xslx:else nodes that went with your enhanced xslx:if. Extensions can simplify this process:

<xsl:apply-templates 
        select="set:leading(following-sibling::xslx:else | 
        following-sibling::xslx:elsif, following-sibling::xslx:if)"/>

This code specifies that you select all xslx:else and xslx:elseif siblings that come after the current node, but before the next xslx:if.

You want extended information about a node in the source tree

Xalan provides functions that allow you to get information about the location of nodes in the source tree. Saxon 6.5.2 provides only saxon:systemId and saxon:lineNumber. Debugging is one application of these functions. To use the functions, set the TRansformerFactory source_location attribute to true with either the command-line utility -L flag or the TRansformerFactory.setAttribute( ) method.

systemId( )

systemId(node-set): Returns the system ID for the current node and the first node in the node set, respectively.

lineNumber( )

lineNumber(node-set): Returns the line number in the source document for the current node and the first node in the node set, respectively. This function returns -1 if the line number is unknown (for example, when the source is a DOM Document).

columnNumber( )

columnNumber(node-set)

Returns the column number in the source document for the current node and the first node in the node set, respectively. This function returns -1 if the column number is unknown (for example, when the source is a DOM Document):

<xsl:stylesheet version="1.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
 xmlns:xalan="http://xml.apache.org/xslt"
 xmlns:info="xalan://org.apache.xalan.lib.NodeInfo">
   
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
     
  <xsl:template match="foo">
    <xsl:comment>Matched a foo on line <xsl:value-of 
    select="info:lineNumber( )"/> and column <xsl:value-of 
    select="info:columnNumber( )"/>.</xsl:comment>
    <!-- ... -->
  </xsl:template>     
     
</xsl:stylesheet>

You want to interact with a relational database

Interfacing XSLT to a relational database opens up a whole new world of possibilities. Both Saxon and Xalan have extensions to support SQL. If you write stylesheets that modify databases, you violate the XSLT no-side-effects rule.

Michael Kay has this to say about Saxon's SQL extensions, "These are not intended as being necessarily a production-quality piece of software (there are many limitations in the design), but more as an illustration of how extension elements can be used to enhance the capability of the processor."

Saxon provides database interaction via five extension elements: sql:connect, sql:query, sql:insert, sql:column, and sql:close. Anyone who ever interacted with a relational database though ODBC or JDBC should feel comfortable using these elements.

<sql:connect driver="jdbc-driver" database="db name" user="user name"password="user password"/>: Creates a database connection. Each attribute can be an attribute value template. The driver attribute names the JDBC driver class, and the database must be a name that JDBC can associate with an actual database.
<sql:query table="the table" column="column names" where="where clause" row-tag="row element name" column-tag="column element name" disable-output-escaping="yes or no"/>: Performs a query and writes the results to the output tree using elements to represent the rows and columns. The names of these elements are specified by row-tag and col-tag, respectively. The column attribute can contain a list of columns or use * for all.
<sql:insert table="table name">: Performs an SQL INSERT. The child elements (sql:column) specify the data to be added to the table.
<sql:column name="col name" select="xpath expr"/>: Used as a child of sql:insert. The value can be specified by the select attribute or by the evaluation of the sql:column's child elements. However, in both cases only the string value can be used. Hence, there is no way to deal with other standard SQL data types.

Xalan's SQL support is richer than Saxon's. This chapter covers only the basics. The "See Also" section provides pointers to more details. Unlike Saxon, Xalan uses extension functions that provide relational database access.

sql:new(driver, db, user, password)

Establishes a connection.

sql:new(nodelist)

Sets up a connection using information embedded as XML in the input document or stylesheet. For example:

<DBINFO>
  <dbdriver>org.enhydra.instantdb.jdbc.idbDriver</dbdriver>
  <dburl>jdbc:idb:../../instantdb/sample.prp</dburl>
  <user>jbloe</user>
  <password>geron07moe</password>
</DBINFO>
   
<xsl:param name="cinfo" select="//DBINFO"/>
<xsl:variable name="db" select="sql:new($cinfo)"/>

sql:query(xconObj, sql-query): Queries the database. The xconObj is returned by new( ). The function returns a streamable result set in the form of a row-set node. You can work your way through the row set one row at a time. The same row element is used repeatedly, so you can begin transforming the row set before the entire result set is returned.
sql:pquery(xconObj,sql-query-with-params)
sql:addParameter(xconObj, paramValue)
sql:addParameterFromElement(xconObj,element)
sql:addParameterFromElement(xconObj,node-list)clearParameters(xconObj): Used together to implement parameterized queries. Parameters take the form of ? characters embedded in the query. The various addParameter( ) functions set these parameters with actual values before the query is executed. Use clearParameters( ) to make the connection object forget about prior values.
sql:close(xconObj): Closes the connection to the database.

The sql:query( ) and sql:pquery() extension functions return a Document node that contains (as needed) an array of column-header elements, a single row element that is used repeatedly, and an array of col elements. Each column-header element (one per column in the row set) contains an attribute (ColumnAttribute) for each column descriptor in the ResultSetMetaData object. Each col element contains a text node with a textual representation of the value for that column in the current row.

You can find more information on using XSLT to access relational data in Doug Tidwell's XSLT (O'Reilly, 2001).

You want to dynamically evaluate an XPath expression created at runtime

Saxon and Xalan have a very powerful extension function called evaluate that takes a string and evaluates it as an XPath expression. EXSLT.org also defines dyn:evaluate( ) which will give you greater portability. Such a feature was under consideration for XSLT 2.0, but at this time, the XSLT 2.0 working group decided not to pursue it. Their justification is that dynamic evaluation " . . . has significant implications on the runtime architecture of the processor, as well as the ability to do static optimization."

Dynamic capabilities can come in handy when creating a table-driven stylesheet. The following stylesheet can format information on people into a table, but you can customize it to handle an almost infinite variety of XML formats simply by altering entries in a table:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:saxon="http://icl.com/saxon" 
 xmlns:paths="http://www.ora.com/XSLTCookbook/NS/paths" 
 exclude-result-prefixes="paths">
   
<xsl:output method="html"/>
   
<!-- This parameter is used to specify a document con taining a table that -->
<!-- specifies how to locate info on people -->
<xsl:param name="pathsDoc"/>
     
<xsl:template match="/">
<html>
  <head>
    <title>People</title>
  </head>
  <body>
  <!-- We load an Xpath expression out of a table [Symbol_Wingdings_224]
  <!-- in an external document. -->
  <xsl:variable name="peoplePath" 
       select="document($pathsDoc)/*/paths:path[@type='people']/@xpath"/>
    <table>
    <tbody>
      <tr>
        <th>First</th>
        <th>Last</th>
      </tr>
      <!-- Dynamically evaluate the xpath that locates information on --> 
      <!-- each person -->
      <xsl:for-each select="saxon:evaluate($peoplePath)">
        <xsl:call-template name="process-person"/>
      </xsl:for-each>
    </tbody>
  </table>
  </body>
</html>
</xsl:template>
   
<xsl:template name="process-person">
  <xsl:variable name="firstnamePath" 
      select="document($pathsDoc)/*/paths:path[@type='first']/@xpath"/> 
  <xsl:variable name="lastnamePath" 
      select="document($pathsDoc)/*/paths:path[@type='last']/@xpath"/> 
  <tr>
    <!-- Dynamically evaluate the xpath that locates the person -->
    <!-- specific info we want to process -->
    <td><xsl:value-of select="saxon:evaluate($firstnamePath)"/></td>
    <td><xsl:value-of select="saxon:evaluate($lastnamePath)"/></td>
  </tr>
</xsl:template>
   
</xsl:stylesheet>

You can use this table to process person data encoded as elements:

<paths:paths 
  xmlns:paths="http://www.ora.com/XSLTCookbook/NS/paths">
  <paths:path type="people" xpath="people/person"/>
  <paths:path type="first" xpath="first"/>
  <paths:path type="last" xpath="last"/>
</paths:paths>

Add this table to process person data encoded as attributes:

<paths:paths xmlns:paths="http://www.ora.com/XSLTCookbook/NS/paths">
  <paths:path type="people" xpath="people/person"/>
  <paths:path type="first" xpath="@first"/>
  <paths:path type="last" xpath="@last"/>
</paths:paths>

You want to change the value of a variable

Almost any book you read on XSLT will describe the inability to change the value of variables and parameters once they are bound as a feature of XSLT rather than a defect. This is true because it prevents a certain class of bugs, makes stylesheets easier to understand, and enables certain performance optimizations. However, sometimes being unable to change the values is simply inconvenient. Saxon provides a way around this obstacle with its saxon:assign extension element. You can use saxon:assign only on variables designated as assignable with the extension attribute saxon:assignable="yes":

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:saxon="http://icl.com/saxon"
extension-element-prefixes="saxon">
   
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
   
<xsl:variable name="countFoo" select="0" saxon:assignable="yes"/>
   
<xsl:template name="foo">
    <saxon:assign name="countFoo" select="$countFoo + 1"/>
    <xsl:comment>This is invocation number <xsl:value-of select="$countFoo"/> of 
template foo.</xsl:comment>       
</xsl:template>
   
<!- ... -->
   
</xsl:stylesheet>

You want to write first-class extension functions in XSLT 1.0

Many examples in this book are implemented as named templates accessed via xsl:call-template. Often, this implementation is inconvenient and awkward because what you really want is to access this code as first-class functions that can be invoked as easily as native XPath functions. This is supported in XSLT 2.0, but in 1.0, you might consider using an EXSLT extension called func:function that is implemented by Saxon and the latest version of Xalan (Version 2.3.2 at this writing). The following code is a template from Chapter 2 reimplemented as a function:

<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:func="http://exslt.org/functions" 
  xmlns:str="http://www.ora.com/XSLTCookbook/namespaces/strings"
  extension-element-prefixes="func">
   
  <xsl:template match="/">
    <xsl:value-of 
      select="str:substring-before-last('123456789a123456789a123',
  </xsl:template>
     
  <func:function name="str:substring-before-last"> 
  <xsl:param name="input"/>
  <xsl:param name="substr"/>
  
  <func:result>
    <xsl:if test="$substr and contains($input, $substr)">
      <xsl:variable name="temp" 
                    select="substring-after($input, $substr)" />
      <xsl:value-of select="substring-before($input, $substr)" />
      <xsl:if test="contains($temp, $substr)">
        <xsl:value-of 
             select="concat($substr,
                            str:substring-before-last($temp, $substr))"/>
      </xsl:if>
    </xsl:if>
  </func:result>
</func:function>
     
</xsl:stylesheet>

XSLT 2.0

Most of the extensions available in Saxon 6 are also available in Saxon 8. However, some, such as saxon:function()are no longer needed in XSLT 2.0. Some additional functions exist to enhance the abilities of XQuery because these capabilities exist in XSLT already. For example, saxon:index( ) and saxon:find() achieve similar results to XSLT keys (xsl:key and key( ) function. However, there are a few additional goodies in Saxon 8 that are not available in the older product.

You want to get an XPath expression to the current node

The saxon:path( ) function takes no arguments and returns a string whose value is an XPath expression to the context node. This is similar to the XSLT solution (introduced in Recipe 15.2 for debugging purposes).

You want to handle and recover from dynamic errors

Many modern languages like Java and C++ have a try-throw-catch mechanism for handling dynamic (runtime) error. Saxon adds a saxon:try pseudo function in its commercial version (Saxon-SA) that provides similar if more limited capabilities. saxon:try takes an expression as its first argument. The expression is evaluated and if a dynamic error occurs (e.g., division by zero, type errors, etc.) the value of the second argument is returned:

<xsl:template match="/">
  <test>
    <xsl:value-of select="saxon:try(*[0], 'Index out of bounds')"/>
    </test>
 </xsl:template>

The value of the second argument could be a error string as we show in this example, or a default value. Michael Kay calls saxon:try a pseudo function because it does not follow the rules of a normal XPath function since it only evaluates the second argument if the first fails.

Discussion

Using vendor-specific extensions is a double-edged sword. On the one hand, they can provide you with the ability to deliver an XSLT solution faster or more simply than you could if you constrained yourself to standard XSLT. In a few cases, they allow you to do things that are impossible with standard XSLT. On the other hand, they can lock you into an implementation whose future is uncertain.

EXSLT.org encourages implementers to adopt uniform conventions for the most popular extensions, so you should certainly prefer an EXSLT solution to a vendor-specific one if you have a choice.

Another tactic is to avoid vendor-specific implementations altogether in favor of your own custom implementation. In this way, you control the source and can port the extension to more than one processor, if necessary. Recipe 14.2, Recipe 14.3, and Recipe 14.4 address custom extensions.