Recipe 2.4. Finding Substrings from the End of a String
Problem
XSLT does not have
any functions for searching strings in reverse.
Solution
XSLT 1.0
Using recursion, you
can
emulate a reverse search with a search for the last occurrence of
substr. Using this technique, you can create a
substring-before-last and a
substring-after-last:
<xsl:template name="substring-before-last">
<xsl:param name="input" />
<xsl:param name="substr" />
<xsl:if test="$substr and contains($input, $substr)">
<xsl:variable name="temp" select="substring-after($input, $substr)" />
<xsl:value-of select="substring-before($input, $substr)" />
<xsl:if test="contains($temp, $substr)">
<xsl:value-of select="$substr" />
<xsl:call-template name="substring-before-last">
<xsl:with-param name="input" select="$temp" />
<xsl:with-param name="substr" select="$substr" />
</xsl:call-template>
</xsl:if>
</xsl:if>
</xsl:template>
<xsl:template name="substring-after-last">
<xsl:param name="input"/>
<xsl:param name="substr"/>
<!-- Extract the string which comes after the first occurrence -->
<xsl:variable name="temp" select="substring-after($input,$substr)"/>
<xsl:choose>
<!-- If it still contains the search string the recursively process -->
<xsl:when test="$substr and contains($temp,$substr)">
<xsl:call-template name="substring-after-last">
<xsl:with-param name="input" select="$temp"/>
<xsl:with-param name="substr" select="$substr"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$temp"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
XSLT 2.0
XSLT 2.0 does not add reverse versions of
substring-before/after, but one can get the
desired effect using the versatile tokenize(
)
function that uses regular expressions:
<xsl:function name="ckbk:substring-before-last">
<xsl:param name="input" as="xs:string"/>
<xsl:param name="substr" as="xs:string"/>
<xsl:sequence
select="if ($substr)
then
if (contains($input, $substr)) then
string-join(tokenize($input, $substr)
[position( ) ne last( )],$substr)
else ''
else $input"/>
</xsl:function>
<xsl:function name="ckbk:substring-after-last">
<xsl:param name="input" as="xs:string"/>
<xsl:param name="substr" as="xs:string"/>
<xsl:sequence
select="if ($substr)
then
if (contains($input, $substr))
then tokenize($input, $substr)[last( )]
else ''
else $input"/>
</xsl:function>
In both functions, we have to test if substring is
empty because tokenize does not allow an
empty
search pattern. Unfortunately, these implementations
won't work exactly like their native counterparts.
This is because tokenize treats its second
argument as a regular, not a literal, string. This could lead to some
surprises. You can fix this by having the function escape the special
characters used in regular expression. You can switch this behavior
on and off via a third Boolean argument. The original two-argument
version and this new three-argument version can coexist because XSLT
allows functions to be overloaded (a function is
defined
by its name and its arity or number of
arguments):
<xsl:function name="ckbk:substring-before-last">
<xsl:param name="input" as="xs:string"/>
<xsl:param name="substr" as="xs:string"/>
<xsl:param name="mask-regex" as="xs:boolean"/>
<xsl:variable name="matchstr"
select="if ($mask-regex)
then replace($substr,'([.+?*^$])','\$1')
else $substr"/>
<xsl:sequence select="ckbk:substring-before-last($input,$matchstr)"/>
</xsl:function>
<xsl:function name="ckbk:substring-after-last">
<xsl:param name="input" as="xs:string"/>
<xsl:param name="substr" as="xs:string"/>
<xsl:param name="mask-regex" as="xs:boolean"/>
<xsl:variable name="matchstr"
select="if ($mask-regex)
then replace($substr,'([.+?*^$])','\$1')
else $substr"/>
<xsl:sequence select="ckbk:substring-after-last($input,$matchstr)"/>
</xsl:function>
Discussion
Both XSLT string-searching functions
(substring-before and
substring-after)
begin searching at the start of the
string. Sometimes you need to search a string from the end. The
simplest way to do this in XSLT is to apply the built-in search
functions recursively until the last instance of the substring is
found.
 |
There was a nasty "gotcha" in my
first attempt at these templates, which you should keep in mind when
working with recursive templates that search strings. Recall that
contains($anything,'') will always return
TRue! For this reason, I make sure that I also
test the existence of a non-null $substr value in
the recursive invocations of substring-before-last
and substring-after-last. Without these checks,
the code will go into an infinite loop for null search input or
overflow the stack on implementations that do not handle tail
recursion.
|
|
Another algorithm is divide and conquer.
The basic idea is to split the
string in half. If the search string is in the second half, then you
can discard the first half, thus turning the problem into a problem
half as large. This process repeats recursively. The tricky part is
when the search string is not in the second half because you may have
split the search string between the two halves. Here is a solution
for substring-before-last:
<xsl:template name="str:substring-before-last">
<xsl:param name="input"/>
<xsl:param name="substr"/>
<xsl:variable name="mid" select="ceiling(string-length($input) div 2)"/>
<xsl:variable name="temp1" select="substring($input,1, $mid)"/>
<xsl:variable name="temp2" select="substring($input,$mid +1)"/>
<xsl:choose>
<xsl:when test="$temp2 and contains($temp2,$substr)">
<!-- search string is in second half so just append first half -->
<!-- and recurse on second -->
<xsl:value-of select="$temp1"/>
<xsl:call-template name="str:substring-before-last">
<xsl:with-param name="input" select="$temp2"/>
<xsl:with-param name="substr" select="$substr"/>
</xsl:call-template>
</xsl:when>
<!--search string is in boundary so a simple substring-before -->
<!-- will do the trick-->
<xsl:when test="contains(substring($input,
$mid - string-length($substr) +1),
$substr)">
<xsl:value-of select="substring-before($input,$substr)"/>
</xsl:when>
<!--search string is in first half so throw away second half-->
<xsl:when test="contains($temp1,$substr)">
<xsl:call-template name="str:substring-before-last">
<xsl:with-param name="input" select="$temp1"/>
<xsl:with-param name="substr" select="$substr"/>
</xsl:call-template>
</xsl:when>
<!-- No occurrences of search string so we are done -->
<xsl:otherwise/>
</xsl:choose>
</xsl:template>
As it turns out, divide and conquer is of little or no advantage
unless you search large texts (roughly 4,000 characters or more). You
might have a wrapper template that chooses the appropriate algorithm
based on the length or switches from divide and conquer to the
simpler
algorithm
when the subpart becomes small enough.
 |