Previous Page Next Page

Recipe 2.4. Finding Substrings from the End of a String

Problem

XSLT does not have any functions for searching strings in reverse.

Solution

XSLT 1.0

Using recursion, you can emulate a reverse search with a search for the last occurrence of substr. Using this technique, you can create a substring-before-last and a substring-after-last:

<xsl:template name="substring-before-last">
  <xsl:param name="input" />
  <xsl:param name="substr" />
  <xsl:if test="$substr and contains($input, $substr)">
    <xsl:variable name="temp" select="substring-after($input, $substr)" />
    <xsl:value-of select="substring-before($input, $substr)" />
    <xsl:if test="contains($temp, $substr)">
      <xsl:value-of select="$substr" />
      <xsl:call-template name="substring-before-last">
        <xsl:with-param name="input" select="$temp" />
        <xsl:with-param name="substr" select="$substr" />
      </xsl:call-template>
    </xsl:if>
  </xsl:if>
</xsl:template>
   
<xsl:template name="substring-after-last">
<xsl:param name="input"/>
<xsl:param name="substr"/>
   
<!-- Extract the string which comes after the first occurrence -->
<xsl:variable name="temp" select="substring-after($input,$substr)"/>
   
<xsl:choose>
     <!-- If it still contains the search string the recursively process -->
     <xsl:when test="$substr and contains($temp,$substr)">
          <xsl:call-template name="substring-after-last">
               <xsl:with-param name="input" select="$temp"/>
               <xsl:with-param name="substr" select="$substr"/>
          </xsl:call-template>
     </xsl:when>
     <xsl:otherwise>
          <xsl:value-of select="$temp"/>
     </xsl:otherwise>
</xsl:choose>
</xsl:template>

XSLT 2.0

XSLT 2.0 does not add reverse versions of substring-before/after, but one can get the desired effect using the versatile tokenize( ) function that uses regular expressions:

<xsl:function name="ckbk:substring-before-last">
    <xsl:param name="input" as="xs:string"/>
    <xsl:param name="substr" as="xs:string"/>
    <xsl:sequence 
       select="if ($substr) 
               then 
                  if (contains($input, $substr)) then 
                  string-join(tokenize($input, $substr)
                    [position( ) ne last( )],$substr) 
                  else ''
               else $input"/>
</xsl:function>

<xsl:function name="ckbk:substring-after-last">
    <xsl:param name="input" as="xs:string"/>
    <xsl:param name="substr" as="xs:string"/>
    <xsl:sequence 
    select="if ($substr) 
            then
               if (contains($input, $substr))
               then tokenize($input, $substr)[last( )] 
               else '' 
            else $input"/>
</xsl:function>

In both functions, we have to test if substring is empty because tokenize does not allow an empty search pattern. Unfortunately, these implementations won't work exactly like their native counterparts. This is because tokenize treats its second argument as a regular, not a literal, string. This could lead to some surprises. You can fix this by having the function escape the special characters used in regular expression. You can switch this behavior on and off via a third Boolean argument. The original two-argument version and this new three-argument version can coexist because XSLT allows functions to be overloaded (a function is defined by its name and its arity or number of arguments):

<xsl:function name="ckbk:substring-before-last">
    <xsl:param name="input" as="xs:string"/>
    <xsl:param name="substr" as="xs:string"/>
    <xsl:param name="mask-regex" as="xs:boolean"/>
    <xsl:variable name="matchstr" 
               select="if ($mask-regex) 
                          then replace($substr,'([.+?*^$])','\$1')
                          else $substr"/>

    <xsl:sequence select="ckbk:substring-before-last($input,$matchstr)"/>
</xsl:function>

<xsl:function name="ckbk:substring-after-last">
    <xsl:param name="input" as="xs:string"/>
    <xsl:param name="substr" as="xs:string"/>
    <xsl:param name="mask-regex" as="xs:boolean"/>
    <xsl:variable name="matchstr" 
               select="if ($mask-regex) 
                          then replace($substr,'([.+?*^$])','\$1')
                          else $substr"/>

    <xsl:sequence select="ckbk:substring-after-last($input,$matchstr)"/>
</xsl:function>

Discussion

Both XSLT string-searching functions (substring-before and substring-after) begin searching at the start of the string. Sometimes you need to search a string from the end. The simplest way to do this in XSLT is to apply the built-in search functions recursively until the last instance of the substring is found.

There was a nasty "gotcha" in my first attempt at these templates, which you should keep in mind when working with recursive templates that search strings. Recall that contains($anything,'') will always return TRue! For this reason, I make sure that I also test the existence of a non-null $substr value in the recursive invocations of substring-before-last and substring-after-last. Without these checks, the code will go into an infinite loop for null search input or overflow the stack on implementations that do not handle tail recursion.


Another algorithm is divide and conquer. The basic idea is to split the string in half. If the search string is in the second half, then you can discard the first half, thus turning the problem into a problem half as large. This process repeats recursively. The tricky part is when the search string is not in the second half because you may have split the search string between the two halves. Here is a solution for substring-before-last:

<xsl:template name="str:substring-before-last"> 
   
  <xsl:param name="input"/>
  <xsl:param name="substr"/>
  
  <xsl:variable name="mid" select="ceiling(string-length($input) div 2)"/>
  <xsl:variable name="temp1" select="substring($input,1, $mid)"/>
  <xsl:variable name="temp2" select="substring($input,$mid +1)"/>
  <xsl:choose>
    <xsl:when test="$temp2 and contains($temp2,$substr)">
      <!-- search string is in second half so just append first half -->
      <!-- and recurse on second -->
      <xsl:value-of select="$temp1"/>
      <xsl:call-template name="str:substring-before-last">
        <xsl:with-param name="input" select="$temp2"/>
        <xsl:with-param name="substr" select="$substr"/>
      </xsl:call-template>
    </xsl:when>
    <!--search string is in boundary so a simple substring-before -->
    <!-- will do the trick-->
    <xsl:when test="contains(substring($input,
                                       $mid - string-length($substr) +1),
                                       $substr)">
      <xsl:value-of select="substring-before($input,$substr)"/>
    </xsl:when>
    <!--search string is in first half so throw away second half-->
    <xsl:when test="contains($temp1,$substr)">
      <xsl:call-template name="str:substring-before-last">
      <xsl:with-param name="input" select="$temp1"/>
      <xsl:with-param name="substr" select="$substr"/>
      </xsl:call-template>
    </xsl:when>
    <!-- No occurrences of search string so we are done -->
    <xsl:otherwise/>
  </xsl:choose>
  
</xsl:template>

As it turns out, divide and conquer is of little or no advantage unless you search large texts (roughly 4,000 characters or more). You might have a wrapper template that chooses the appropriate algorithm based on the length or switches from divide and conquer to the simpler algorithm when the subpart becomes small enough.


Previous Page Next Page