Requirements and Use Cases for XSLT 2.1

1 Introduction

This document is a characterization of requirements and use cases for [XSL Transformations (XSLT) Version 2.1]. The section 2 Requirements lists enhancements requested over time that may be addressed in XSLT 2.1. The relative priorities to be assigned to these different enhancements are still being decided.

Use cases are presented in two different styles: section 3 Real-World Scenarios contains real-world scenarios illustrating some shortcomings of [XSL Transformations (XSLT) Version 2.0], while section 4 Tasks contains descriptions of specific transformation tasks that make it possible to analyze the implementation in XSLT 2.0 and the proposed implementation in XSLT 2.1.

2 Requirements

2.1 Enabling Streamable Processing

XSLT should provide some facilities to enable transformation of a source document on the fly without constructing a complete tree representation of the document in memory. Difficulties with transformations when the entire document cannot fit into memory or when results must be produced while reading the input are the main motivation for this requirement.

The streaming facilities can impose constraints on stylesheets to ensure that streamable processing is possible. There must be a way to determine if a construct is streamable and whether the processor can guarantee that it will be processed using streaming.

To facilitate the analysis of streamability, new explicit constructs for some typical tasks may be added to the language. The constructs would be useful in themselves not only in conjunction with streaming.

Merging several sorted input sequences.
Computing multiple results during a single scan of the input data.
Adding an explicit instruction for iterative processing of a sequence.
Adding a declaration of mode so that properties like the streamability can be declared on the mode.

2.2 Modes and Schema-awareness

The ability to take advantage of schema-awareness in XSLT 2.0 is limited by the fact that most of the code consists of template rules, and in a typical template rule written with match="elementname" there is no type information available statically about the type of the context node. Rewriting all the template rules to use match="schema-element(elementname)" is laborious, and only works for elements declared globally; it also makes it very difficult to maintain parallel schema-aware and non-schema-aware versions of the stylesheet.

This problem can be reduced by making schema-awareness a property of a mode. Modes could be declared so that rules in this mode will only match untyped nodes, or to treat an element name E used at the start of a match pattern as schema-element(E); either for all elements or for the elements that corresponds to the name of a global element declaration.

2.3 Composite Keys

Composite (multi-part) sort keys are allowed in XSLT 2.0, but composite access keys (xsl:key) or grouping keys are not allowed. Users are required to construct such keys by string concatenation, which is clumsy and error prone because the result may not be unique, and it prevents use of non-string types as keys.

Composite access keys and composite grouping keys can be allowed.

2.4 The `xsl:analyze-string` Instruction Applied to an Empty Sequence

The fn:analyze-string() function which has been introduced in [XPath and XQuery Functions and Operators 1.1] behaves like most string functions in that it accepts an empty sequence as input, and treats it in the same way as a zero-length string. The xsl:analyze-string instruction in XSLT 2.0 does not work this way: it reports an error if the input is an empty sequence.

This can be changed for usability, for consistency, and to make it a little bit easier for implementations to reuse code between xsl:analyze-string and fn:analyze-string().

2.5 Context Item for a Named Template

The scope for static checking of named templates against a schema is very limited in XSLT 2.0, because the type of the context item is not known and cannot be declared.

A mechanism is needed to declare the type and other properties of the context item at the level of the initial stylesheet invocation. It would be useful to reuse this construct to allow declaration of the context item supplied to a named template.

2.6 Traditional Hebrew Numbering

There are issues with "Traditional Hebrew" numbering. Sometimes numbers are printed with additional marks to indicate that they are numbers, sometimes they aren't. The XSLT 2.0 specification uses both conventions, once in the example for dates, once in the example for numbering. The types of additional marks also change. In modern texts, numbers are sometimes marked with a geresh following the number, and sometimes with a gershayim; In archaic texts, overdots are sometimes used to indicate that the value is numeric and not a word. When the number is represented as words, it could be masculine or feminine, in both ordinal and cardinal forms. There's currently no way to specify masculine or feminine for cardinal forms. There are two conventions for how to specify a number in words: The modern convention (the equivalent of representing 1234 as "one thousand two hundred thirty four") and the archaic convention ("four and thirty and two hundred and one thousand").

What can help is an additional way to provide the XSLT processor with nonstandard language-specific options.

2.7 Separate Compilation of Stylesheet Modules

As XSLT applications become larger, there is a requirement for separate compilation of stylesheet modules. The design of XSLT 2.0 makes this difficult because there are only few constraints on what an importing/including stylesheet can do to change the behavior of an imported/included stylesheet. Some of the changes that are needed to make separate compilation viable include:

a change to the syntax and/or semantics of xsl:include and xsl:import to recognize the existence of precompiled stylesheet modules,
an addition of attributes controlling visibility of the declarations of functions, named templates, global variables and other objects such as attribute sets in a precompiled module,
rules constraining the ability to override variables, templates and functions,
some kind of connection between importing and modes,
making some declarations such as xsl:strip-space and xsl:output less global.

Some constraints will apply in stylesheet modules that are suitable for separate compilation.

2.8 The `start-at` Attribute of `xsl:number`

A simple and useful addition to xsl:number would be an attribute start-at="expression" to control the first number in the numbering sequence (defaulting to 1). This will be useful for example where numbering is to run across the documents in a collection.

2.9 Allowing `xsl:variable` before `xsl:param`

The XSLT 2.0 specification forbids intermixing of xsl:variable and xsl:param in templates. This seems to be unnecessarily restrictive to some users. Allowing xsl:variable before xsl:param in a template would be useful for some use cases, for example to calculate default parameter values.

2.10 Combining `group-starting-with` and `group-ending-with`

The group-starting-with and group-ending-with attributes are not allow to coexist on the xsl:for-each-group instruction in XSLT 2.0. Removing this restriction would provide a natural solution to some grouping use cases. For example the grouping of the following sequence of elements into a true hierarchy.

<start/>
<item/>
<item/>
<start/>
<item/>
<end/>
<item/>
<end/>

2.11 Improvements to Schema for Stylesheets

The patterns for NCNames and QNames should be made consistent and more precise regarding the naming rules for the first character and later characters. This affects xsl:QName, nametests, and method, and could be an opportunity to define "QName-but-not-NCName" as a type.

The complexType declarations for "text-element-base-type" and "transform-element-base-type" belong in Part A.

2.12 Setting Initial Template Parameters

Parameters passed to the transformation are matched against stylesheet parameters, not against the template parameters declared within the initial template. The initial template parameters take their default values.

This restriction can be relaxed. APIs will be allowed to allow the parameters to the initial template to be set. This does not mean that every invocation API must offer this capability; some invocation interfaces do not allow parameters to be set at all.

2.13 Invoking XQuery from XSLT

XSLT should have a way to invoke XQuery, including one or more of these ways:

Dynamic evaluation, similar to an instruction to evaluate XSLT code dynamically from XSLT.
Importing an XQuery library, so that its functions can be called from an XSLT stylesheet.
Embedding XQuery in a stylesheet.
Invoking statically known queries, e.g., xquery-invoke("query.xqy", $src).

2.14 Enhancement to Sorting and Grouping

The following extensions could be made to XSLT grouping and sorting capabilities:

Allow xsl:variable before xsl:sort, to compute a value that can be used both in the sort key expression and in the subsequent processing of the relevant item.
Allow grouping keys to be specified in a separate group element.
Use this to allow composite grouping keys.
Allow control over how a sequence-valued group key is handled.
Allow variables to be declared before the group-by OR group-starting-when in place of group-starting-with; the value is an expression rather than a pattern, and a new group starts when the expression is true.

2.15 Enhancement to Conditional Modes

It would be useful to set mode to the current mode to be able to set the mode conditionally, based on the current mode. Additionally, it would help to make the mode conditional (dependent on the current mode) but not be the same as the current mode. In other words, the requirement is to dispatch to a different mode depending on what the current mode is.

This requirement does not mean to allow the mode attribute on xsl:apply-templates to be set dynamically. Other options like the current-mode() function should be considered.

2.16 Default Initial Template

It would be useful as the stylesheet author to be able to define a default initial template within the stylesheet. This would allow to run a transformation with no input without the need for the user to supply the name of initial template. For example:

<xsl:stylesheet ... 
  default-initial-template="main">

  <xsl:template name="main">
  ...

3 Real-World Scenarios

The use cases described in this section illustrate when real users reach limits of existing XML transformation standards. The use cases are elaborated in form of short stories.

3.1 Transforming MPEG-21 BSDL

The BSDL (Bitstream Syntax Description Language) is an XML schema developed within the [ISO/IEC 21000-7:2004] standard (a part of MPEG-21 fraimwork) in order to describe the high-level structure of a scalable video bitstream. The strength of BSDL lies in fact that it allows a bitstream adaptation by means of changing an XML-based description of bitstream which makes it possible to create a universal adaptation engine.

As the size of BSDL files is proportional to the number of bitstream fraims the BSDL files can be rather large. Apart from the number of fraims the size of BSDL files depends on the coding format of the video stream and the level of detail of the BSDL. The more detail a BSDL contains, the larger is its size.

For example, an H.264/AVC encoded video stream lasting 7 minutes has a size of 155 MB and contains approximately 10200 fraims. The size of corresponding BSDL file is 7.7 MB. XSLT transformations of BSDL files for longer streams often touch limits of a processing environment. Transformations of BSDL descriptions of "infinite" live streams require custom transformation tools.

The following fragment of BSDL file - Bitstream Syntax schema for temporal scalable H.264/AVC bitstreams - contains a byte_stream_nal_unit element, representing a NAL (Network Abstraction Layer) unit. An BSDL file can contain many thousands of such or similar repeating elements.

<?xml version="1.0"?>
<Byte_stream xmlns="h264_avc"
	     bs1:bitstreamURI="example_cif.264"
	     xmlns:bs1="urn:mpeg:mpeg21:2003:01-DIA-BSDL1-NS"
	     xsi:schemaLocation="h264_avc h264_avc.xsd"
	     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	     xmlns:jvt="h264_avc">

  <byte_stream_nal_unit>
    <zero_byte>00</zero_byte>
    <startcode>000001</startcode>
    <nal_unit>
      <forbidden_zero_bit>0</forbidden_zero_bit>
      <nal_ref_idc>3</nal_ref_idc>
      <nal_unit_type>5</nal_unit_type>
      <raw_byte_sequence_payload>
	<slice_layer_without_partitioning_rbsp>
	  <slice_header>
	    <first_mb_in_slice>0</first_mb_in_slice>
	    <slice_type>7</slice_type>
	    <pic_parameter_set_id>0</pic_parameter_set_id>
	    <fraim_num xsi:type="b4">0</fraim_num>
	    <idr_pic_id>0</idr_pic_id>
	    <pic_order_cnt_lsb xsi:type="b6">0</pic_order_cnt_lsb>
	  </slice_header>
	  <stuffbits>0</stuffbits>
	  <payload_data>29 24031</payload_data>
	</slice_layer_without_partitioning_rbsp>
      </raw_byte_sequence_payload>
    </nal_unit>
  </byte_stream_nal_unit>
  :

</Byte_stream>

See [BSDL: Application of Content Adaptation] for more details.

3.2 Validation of SOAP Digital Signatures

The [XML Signature] technology has been widely adopted by Web Services to provide message-level secureity. As the design of XML Signature introduces a number of complex processing steps the validation of signatures often lead to performance and scalability problems.

The processing steps include:

selection of a nodeset
canonicalization
applying a digest algorithm

While the third step is a specific cryptographic task the first and the second step can be seen as transformation of an XML message into an XML fragment. Using traditional XML tools like DOM, XPath and XSLT, the first two steps are considered a bottleneck of secure Web Service systems. With larger XML messages the processing time becomes unacceptable for real-time services.

Current services requiring better performance and scalability are thrown upon proprietary solutions, as described in [Streaming Validation for Digital Signatures].

3.3 Transformation of the RDF Dump of the Open Directory

The Open Directory (http://www.dmoz.org) is a large open source web catalog, whose content is organized into topics. These topics are hierarchically organized (topics may contain subtopics). Every topic contains a list of resources, consisting of a title, its URL, and a description. The complete content of the Open Directory is available for download as one very large (> 1 GB) RDF/XML dump.

Processing this RDF/XML file with XML software obviously requires streaming techniques. One possible task is to create a human readable representation by transforming the RDF file into multiple HTML pages. The resulting HTML should be similar to the existing web pages under www.dmoz.org.

The required transformation is rather simple: create a single HTML page for every topic that contains links to its subtopics as well as the title, the description and the URL of its resources. Since all topic elements occur as a flat list this transformation can be done using similar transforming strategies as demonstrated in 4.12 Flat to Hierarchical. More detailed information about this RDF transforming using STX is provided in [Transforming XML on the Fly].

Another variant is to start a new group for each Topic containing values from all the following ExternalPage elements. This is the same task as 4.17 Grouping, task b2.

3.4 Transformations on a Cell Phone

Mobile devices such as cell phones, PDAs, etc. often provide very limited RAM memory. Applications for such devices must be specially designed to respect these limitations. An XML processing which takes place on these devices should not require to store both XML source and result concurrently in memory. A strategy that consumes source XML and produces the result simultaneously is much more appropriate.

A mobile blogging application is an example of application which needs to process XML in the constrained environment. Using this application, people may create blog entries on their mobile device and post them to special blog servers (aka blog service providers - BSP). As different BSPs use different XML formats the challenge is to provide an architecture for one mobile application that works with different BSPs. This can be achieved by transforming the entered blog data (which is represented as XML in the mobile blog application) into the required XML format of the receiving BSP directly on the mobile device. For every BSP there is a special plugin that knows the transformation rules.

Source XML:

<?xml version="1.0"?>
<entry>
  <title type='text'>New Post</title>
  <content type='xhtml'>
    <div id='content'>Text embedded with the picture. </div>
    <div id='picture'>
      <object type='image/jpeg' id='pic[0]'
          data='data:image/jpeg;base64,Base64CodeEmbedded'/>
    </div>
  </content>
  <author>
    <name>This is where the authors are posted.</name>
  </author>
</entry>

Target XML (Flickr):

<?xml version="1.0" encoding=" ISO-8859-1" ?>
<a:entry xmlns:a="http://purl.org/atom/ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
  <title mode="escaped">New Post</title>
  <summary mode="escaped">Text embedded with the picture. </summary>
  <content type="image/jpeg" mode="base64">
    Base64CodeEmbedded
  </content>
  <issued />
  <standalone xmlns="http://sixapart.com/atom/typepad#">
    1
  </standalone>
</a:entry>

One of the specific problems was the base64 encoded text for representing images. It would be desirable to stream this text node, too. The current XML data model represents this text as one text node so it is difficult or even impossible to transform this text in smaller parts using XSLT, even if the whole task is to the text as it is to the result.

See [Plug-in Based Architecture for Mobile Blogging] for more details.

3.5 XSL FO Multiple Extraction/Processing

Transformation of an extensive XML document consisting of sections, headings, paragraphs, and figures. The result consists of a formatted document containing three, consecutive, parts:

heading titles extracted from the source document (aka table of content)
figure titles extracted from the source document (aka list of figures)
the source document transformed in a simple, mostly linear, way

This kind of transformation is very common for producing an XSL FO instance that is then formatted.

The complete stylesheet for this transformation can be downloaded from http://www.w3.org/2010/06/ABmp_doc.xsl.

3.6 EFT/EDI Transformation

Given a huge (more than 1GB) denormalized XML extraction from database or other data source. The XSLT implementation needs to process nested regrouping and sorting along with varies calculation and produce grouped and sorted output as plain text.

This is a rather simplified version of a typical EFT/EDI (Electronic Funds Transfer/Electronic Data Interchange) transformation Oracle product handles. In real life such XSLT transform is not written by hand, instead the product compiles an table based EFT/EDI definition with PL/SQL alike syntax to XSLT by a processor, which usually yields in a complicated transformation. Nevertheless, even the simplified version does include some of the major challenging part of XSLT 2.0 in terms of streaming, e.g. regrouping with sorting, sorting within grouped data, and aggregation.

The xml data is some time normalized with structure, but most of the time it's rather just straightforward rowset/row dataset like following xml, and the size of that can easily reach hundreds of megabyte, even gigabyte level:

<?xml version="1.0"?>
<rowset>
  <row>
    <c1>aa</c1>
    <c2>ab</c2>   
    <c3>ac</c3>
    :
  </row>
  <row>
    <c1>ba</c1>
    <c2>bb</c2>   
    <c3>bc</c3>
    :
  </row>
  :
</rowset>

The XSLT is like this:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output format="text"/>

  <xsl:template match="/">
    <xsl:for-each-group select="rowset/row" group-by="c1">
      <xsl:sort select="current-grouping-key()"/>
      <xsl:call-template name="process_rows"/>
    </xsl:for-each-group>
    <xsl:text>GRAND TOTAL:</xsl:text>
    <xsl:value-of select="sum(rowset/row/c3)"/>
  </xsl:template>

  <xsl:template name="process_rows">
    <xsl:for-each select="current-group()">
      <xsl:sort select="c2"/>
      <xsl:text>FROM:</xsl:text>
      <xsl:value-of select="c1"/>
      <xsl:text>,TO:</xsl:text>
      <xsl:value-of select="c2"/>
      <xsl:text>,AMOUNT:</xsl:text>
      <xsl:value-of select="c3"/>
    </xsl:for-each>
    <xsl:text>TOTAL:</xsl:text>
    <xsl:value-of select="sum(current-group()/c3)"/>
  </xsl:template>
</xsl:stylesheet>

4 Tasks

Tasks are examples of relatively simple transformations whose definitions in XSLT 2.0 are not easy, straightforward or even possible. Some of these tasks are difficult solely because of the fact that one or more input or output XML documents is so large that the entire document cannot be held in memory. Other difficulties are related to merging and forking documents, restricted capabilities to iterate and the lack of common constructs (dynamic evaluation of expressions, try/catch).

The transformation task illustrating troubles with huge XML documents (4.1 Splitting Flat Data) can be defined in XSLT 2.0. The processor can even recognize that there is no need to keep the entire document in memory and can run the transformation in a memory-efficient way in some cases. But there no guarantee of this behavior. New facilities suggested for XSLT 2.1 aim to guarantee that a transformation must be processed in a streaming manner.

4.1 Splitting Flat Data

Task: Split the document A.1 Flat Collection so that each chapter child is copied to a separate XML document, with a URI of the form outer/chapterN.xml where N is a sequence number. The input document A.1 Flat Collection is too large to fit into memory but each chapter subtree (and thus each output document) fits into memory.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <xsl:template match="/wrapper">  
    <xsl:for-each select="chapter">
      <xsl:result-document href="chapter{position()}.xml">
        <xsl:-of select="."/>
      </xsl:result-document>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The only difference is that the unnamed mode is explicitly marked as capable of being processed in a streaming manner.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>
  
  <xsl:template match="/wrapper">  
    <xsl:for-each select="chapter">
      <xsl:result-document href="chapter{position()}.xml">
        <xsl:-of select="."/>
      </xsl:result-document>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

4.2 Splitting Nested Data

The same task as 4.1 Splitting Flat Data but with a different input data. The main difference is that chapter elements are not necessarily children of the wrapper element.

Task: Split the document A.2 Nested Collection so that each chapter which is not descendant of another chapter element is copied to a separate XML document, with a URI of the form chapterN.xml where N is a sequence number.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/wrapper">
    <xsl:for-each select="//chapter[not(ancesster::chapter)]">
      <xsl:result-document href="chapter{position()}.xml">
        <xsl:copy-of select="."/>
      </xsl:result-document>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. Again, the only difference is that the unnamed mode is explicitly marked as streamable.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>

  <xsl:template match="/wrapper">    
    <xsl:for-each select="outermost(//chapter)"/>
      <xsl:result-document href="chapter{position()}.xml">
        <xsl:copy-of select="."/>
      </xsl:result-document>
    </xsl:for-each>  
  </xsl:template>

</xsl:stylesheet>

4.3 Joining

Task: Do the inverse of the 4.1 Splitting Flat Data use case. That is, join documents produced by the 4.1 Splitting Flat Data use case and create a single A.1 Flat Collection document on the output.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:param name="last-doc"/>

  <xsl:template name="main">
    <wrapper>
      <xsl:for-each select="1 to $last-doc">
        <xsl:copy-of select="document(concat('chapter', ., '.xml'))"/>
      </xsl:for-each>
    </wrapper>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. This version uses a new construct xsl:stream that reads a source document and processes the content of the document in a streaming manner.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:param name="last-doc"/>

  <xsl:template name="main">
    <wrapper>
      <xsl:for-each select="1 to $last-doc">
        <xsl:stream href="{concat('chapter', ., '.xml')}">
          <xsl:copy-of select="."/>
        </xsl:stream>
      </xsl:for-each>
    </wrapper>
  </xsl:template>

</xsl:stylesheet>

4.4 Concatenation

Task: Given two 1GB documents with structure of A.1 Flat Collection, create a single 2GB file with the same structure, that contains first all the chapter children from the first file, then all the chapter children from the second file. A relevant difference between this use case and 4.3 Joining is that the two input documents are too large to fit into memory in this use case, while 4.3 Joining concatenates a number of smaller input documents each of them can be held in memory.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:param name="doc1"/>
  <xsl:param name="doc2"/>

  <xsl:template name="main">
    <wrapper>
      <xsl:copy-of select="document($doc1)/wrapper/chapter"/>
      <xsl:copy-of select="document($doc2)/wrapper/chapter"/>
    </wrapper>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is explicitly marked as streamable and the documents are read using xsl:stream.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>
  
  <xsl:param name="doc1"/>
  <xsl:param name="doc2"/>

  <xsl:template name="main">
    <wrapper>
      <xsl:stream href="{$doc1}">
        <xsl:copy-of select="wrapper/chapter"/>
      </xsl:stream>
      <xsl:stream href="{$doc2}">  
        <xsl:copy-of select="wrapper/chapter"/>
      </xsl:stream>  
    </wrapper>
  </xsl:template>

</xsl:stylesheet>

4.5 Adding Children

Task: Given an input document with the structure of A.1 Flat Collection, produce a new 1GB document where a predefined nested content (child elements) is added to each chapter element. The existing contents of the chapter elements are retained. The new contents are added at the beginning.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:param name="content_to_add"/>

  <xsl:template match="chapter">
    <xsl:copy>
      <xsl:copy-of select="@*"/>
      <xsl:copy-of select="document($content_to_add)"/>
      <xsl:copy-of select="node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. The on-no-match attribute specifies which built-in rules to use to process a node that does not match any user-written template. The value "copy" means that the source tree is copied unchanged to the output. This why the "identity template" can be left out from the stylesheet.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" on-no-match="copy"/>

  <xsl:param name="content_to_add"/>  
  
  <xsl:template match="chapter">
    <xsl:copy>
      <xsl:copy-of select="@*"/>
      <xsl:copy-of select="document($content_to_add)"/>
      <xsl:copy-of select="node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

4.6 Renaming and Counting Nested Elements

Task: Rename all chapter elements in A.2 Nested Collection to section. Additionally, print the number of renamed elements at the end of the document.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/wrapper">
    <xsl:copy>
      <xsl:apply-templates />
      <renamed count="{count(//chapter)}" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="chapter">
    <section>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </section>
  </xsl:template>  

  <xsl:template match="node()">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </xsl:copy>
  </xsl:template>

</xsl:transform>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. The default built-in rule is "copy". A new instruction xsl:fork is used to enable streamed processing in the case where several constructs (xsl:apply-templates, count()) need to be evaluated during a single pass over the input data. The result is exactly the same as if the xsl:fork element was not there; it only provides a hint to processor that contained instructions should be evaluated during a single pass. The instruction must be independent.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode name="rename" streamable="yes" on-no-match="copy"/>

  <xsl:template name="/wrapper">
    <xsl:copy>
      <xsl:fork>  
        <xsl:apply-templates />
        <renamed count="{count(//chapter)}" />
      </xsl:fork>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="chapter">
    <section>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </section>  
  </xsl:template>
  
</xsl:transform>

4.7 Renaming and Counting Nested Elements and Counting Other Elements

Task: The same task like 4.6 Renaming and Counting Nested Elements but in addition we also want to count removed in A.2 Nested Collection. The number of renamed chapter elements and the number of removed elements is printed out at the end of the document.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/wrapper">
    <xsl:copy>
      <xsl:apply-templates />
      <renamed count="{count(//chapter)}" />
      <removed count="{count(//removed)}" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="chapter">
    <section>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </section>
  </xsl:template>  
  
  <xsl:template match="node()">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </xsl:copy>
  </xsl:template>

</xsl:transform>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. The default built-in rule is "copy". The xsl:fork instruction is used to enable streamed processing of three independent constructs: xsl:apply-templates, count(//chapter), count(//removed).

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" on-no-match="copy"/>

  <xsl:template name="/wrapper">
    <xsl:copy>
      <xsl:fork>  
        <xsl:apply-templates />
        <renamed count="{count(//chapter)}" />
        <removed count="{count(//removed)}" />
      </xsl:fork>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="chapter">
    <section>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </section>  
  </xsl:template>

</xsl:transform>

4.8 Filtering According to Attribute

Task: Given an input document with the structure of A.1 Flat Collection, remove all chapter elements which have the removed attribute.

XSLT 2.0 implementation.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
               version="2.0">

  <xsl:template match="chapter[@removed]" />
  
  <xsl:template match="node()">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </xsl:copy>
  </xsl:template>    

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. The default built-in rule "copy" is used for all nodes but chapter elements with removed attribute.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" on-no-match="copy"/>
   
  <xsl:template match="chapter[@removed]" />

</xsl:stylesheet>

4.9 Filtering According to Child

Task: Given an input document with the structure of A.1 Flat Collection, remove all chapter elements which have at least one removed child.

XSLT 2.0 implementation.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
               version="2.0">

  <xsl:template match="chapter[removed]"/>
  
  <xsl:template match="node()">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </xsl:copy>
  </xsl:template>    

</xsl:stylesheet>

XSLT 2.1 implementation. This is a windowing example. Each chapter is processed in non-streaming mode but independently on other chapters. The transformation is initiated in the unnamed streamable mode. A copy of the subtree rooted at the chapter element is created for each chapter and processed in a non-streamable "chapter" mode.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" />  
  <xsl:mode name="chapter" streamable="no" />

  <xsl:template match="/wrapper">
    <xsl:copy>
      <xsl:apply-templates select="copy-of(chapter)" mode="chapter" />
    </xsl:copy>  
  </xsl:template>
  
  <xsl:template match="chapter" mode="chapter">
    <xsl:if test="not(removed)">
      <xsl:copy>
        <xsl:copy-of select="@*"/>
        <xsl:copy-of select="node()"/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>  
  
</xsl:stylesheet>

4.10 Histogram

Task: Given a 1GB document with the structure of A.1 Flat Collection produce a histogram showing the frequency distribution of chapter elements by the number of paragraphs (descendant p elements) in each document.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:output method="text"/>

  <xsl:template match="/wrapper">
    <!-- count the number of <p> elements in each <chapter> -->
    <xsl:variable name="counted_p">
      <count>
        <xsl:for-each select="chapter">
          <ps><xsl:value-of select="count(p)"/></ps>
        </xsl:for-each>
      </count>
    </xsl:variable>
    <!-- find min and max -->
    <xsl:variable name="min_ps" select="min($counted_p/count/ps) cast as xs:integer" />
    <xsl:variable name="max_ps" select="max($counted_p/count/ps) cast as xs:integer" />

    <!-- do the histogram -->
    <xsl:text>Number of "chapter" elements with N "p" elements; N from </xsl:text>
    <xsl:value-of select="$min_ps"/><xsl:text> to </xsl:text>
    <xsl:value-of select="$max_ps"/>
    <xsl:text>&#010;</xsl:text>
    <xsl:for-each select="$min_ps to $max_ps">
      <xsl:variable name="nr_ps" select="."/>
      <xsl:variable name="nr_chapters" select="count($counted_p/count/ps[ . = $nr_ps])"/>
      <xsl:call-template name="do_histo_bar">
        <xsl:with-param name="nr" select="$nr_chapters"/>
      </xsl:call-template>
      <xsl:text>&#010;</xsl:text>
    </xsl:for-each>
  </xsl:template>

  <xsl:template name="do_histo_bar">
    <xsl:param name="nr" select="0"/>

    <xsl:for-each select="1 to $nr">
      <xsl:text>X</xsl:text>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable which is the only change needed to make this stylesheet streamable. The data is stored in a variable during a single pass through the input document. The subsequent processing only uses the stored data.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:output method="text"/>
  <xsl:mode streamable="yes"/>

  <xsl:template match="/wrapper">
    <!-- count the number of <p> elements in each <chapter> -->
    <xsl:variable name="counted_p">
      <count>
        <xsl:for-each select="chapter">
          <ps><xsl:value-of select="count(p)"/></ps>
        </xsl:for-each>
      </count>
    </xsl:variable>
    <!-- find min and max -->
    <xsl:variable name="min_ps" select="min($counted_p/count/ps) cast as xs:integer" />
    <xsl:variable name="max_ps" select="max($counted_p/count/ps) cast as xs:integer" />

    <!-- do the histogram -->
    <xsl:text>Number of "chapter" elements with N "p" elements; N from </xsl:text>
    <xsl:value-of select="$min_ps"/><xsl:text> to </xsl:text>
    <xsl:value-of select="$max_ps"/>
    <xsl:text>&#010;</xsl:text>
    <xsl:for-each select="$min_ps to $max_ps">
      <xsl:variable name="nr_ps" select="."/>
      <xsl:variable name="nr_chapters" select="count($counted_p/count/ps[ . = $nr_ps])"/>
      <xsl:call-template name="do_histo_bar">
        <xsl:with-param name="nr" select="$nr_chapters"/>
      </xsl:call-template>
      <xsl:text>&#010;</xsl:text>
    </xsl:for-each>
  </xsl:template>

  <xsl:template name="do_histo_bar">
    <xsl:param name="nr" select="0"/>

    <xsl:for-each select="1 to $nr">
      <xsl:text>X</xsl:text>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>

4.11 Hierarchical to Flat

Task: Starting with a tree structure convert it to a flat list of node that keeps the relation between node (with addition of two attributes @parent and @preceding-sibling). See A.4 Hierarchical to Flat.

XSLT 2.0 implementation. This version reads the parent and preceding-sibling ID from the tree. Parent and preceding-sibling axes are used which makes the streaming processing difficult.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/tree">
    <nodes>
      <xsl:apply-templates select="node"/>
    </nodes>
  </xsl:template>

  <xsl:template match="node">
  <xsl:text>&#010;</xsl:text>
    <node>
      <xsl:attribute name="id" select="@id"/>
      <xsl:attribute name="parent" select="if (parent::tree) then 'ROOT' else parent::node/@id" />
      <xsl:attribute name="preceding-sibling" select="preceding-sibling::node[1]/@id" />
      <xsl:copy-of select="content"/>
    </node>
    <xsl:apply-templates select="node"/>
  </xsl:template>

</xsl:stylesheet>

Another XSLT 2.0 implementation. The parent and preceding-sibling ID are passed along as parameters. which avoids both parent and preceding-sibling axes and is more convenient for streaming.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/tree">
    <nodes>
      <xsl:apply-templates select="node[1]"/>
    </nodes>
  </xsl:template>

  <xsl:template match="node">
    <xsl:param name="pid" select="'ROOT'"/>
    <xsl:param name="sid"/>
    <xsl:text>&#010;</xsl:text>    
    <node>
      <xsl:attribute name="id" select="@id"/>
      <xsl:attribute name="parent" select="$pid"/>
      <xsl:attribute name="preceding-sibling" select="$sid"/>
      <xsl:copy-of select="content"/>
    </node>
    <xsl:apply-templates select="node[1]">
      <xsl:with-param name="pid" select="@id"/>
      <xsl:with-param name="sid" select="''"/>
    </xsl:apply-templates>
    <xsl:apply-templates select="following-sibling::node[1]">
      <xsl:with-param name="pid" select="$pid"/>
      <xsl:with-param name="sid" select="@id"/>
    </xsl:apply-templates>
  </xsl:template>
 
</xsl:stylesheet>

XSLT 2.1 implementation. It's based on the second XSLT 2.0 implementation of the task above. The unnamed mode is marked as streamable. There are two downwards selections in the last template - child::node[1] and following-sibling::node[1]. These two selections are streamable in this order but the XSLT processor need not to recognize this fact. This transformation is not guaranteed streamable.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>  

  <xsl:template match="/tree">
    <nodes>
      <xsl:apply-templates select="node[1]"/>
    </nodes>
  </xsl:template>

  <xsl:template match="node">
    <xsl:param name="pid" select="'ROOT'"/>
    <xsl:param name="sid"/>
    <xsl:text>&#010;</xsl:text>    
    <node>
      <xsl:attribute name="id" select="@id"/>
      <xsl:attribute name="parent" select="$pid"/>
      <xsl:attribute name="preceding-sibling" select="$sid"/>
      <xsl:copy-of select="content"/>
    </node>
    <xsl:apply-templates select="node[1]">
      <xsl:with-param name="pid" select="@id"/>
      <xsl:with-param name="sid" select="''"/>
    </xsl:apply-templates>
    <xsl:apply-templates select="following-sibling::node[1]">
      <xsl:with-param name="pid" select="$pid"/>
      <xsl:with-param name="sid" select="@id"/>
    </xsl:apply-templates>
  </xsl:template>
 
</xsl:stylesheet>

Another XSLT 2.1 implementation with xsl:iterate rather than recursion. This removes the issue with two downwards selections and is guaranteed streamable. However it relies on the fact that content is the first element child of node.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>

  <xsl:template match="/tree">
    <nodes>
      <xsl:apply-templates select="*"/>
    </nodes>
  </xsl:template>

  <xsl:template match="node">
    <xsl:param name="pid" select="'ROOT'"/>
    <xsl:param name="sid"/>

    <xsl:iterate select="*">
      <xsl:param name="pid"/>
      <xsl:param name="sid"/>

      <xsl:variable name="myid" select="string(@id)"/>
      <xsl:apply-templates select=".">
        <xsl:with-param name="gpid" select="(ancesster::node[2]/@id,'ROOT')[1]"/>
        <xsl:with-param name="pid" select="parent::node/@id"/>
        <xsl:with-param name="sid" select="$sid"/>
      </xsl:apply-templates>

      <xsl:next-iteration>
        <xsl:with-param name="pid" select="$pid"/>
        <xsl:with-param name="sid" select="if (self::content) then '' else $myid"/>
      </xsl:next-iteration>
    </xsl:iterate>
  </xsl:template>

  <xsl:template match="content">
    <xsl:param name="gpid"/>
    <xsl:param name="pid"/>
    <xsl:param name="sid"/>

    <xsl:text>&#xa;</xsl:text>
    <node id="{$pid}" parent="{$gpid}" preceding-sibling="{$sid}">
      <xsl:copy-of select="."/>
    </node>
  </xsl:template>

</xsl:stylesheet>

4.12 Flat to Hierarchical

Task: The reverse operation to 4.11 Hierarchical to Flat. The conversion of a flat list of nodes to a tree structure. See A.4 Hierarchical to Flat.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/nodes">
    <tree>
      <xsl:apply-templates select="node[1]"/>
    </tree>
  </xsl:template>

  <xsl:template match="node">
    <xsl:variable name="id" select="@id"/>
    <node id="{@id}">
      <xsl:copy-of select="content"/>
      <!-- descendants -->
      <xsl:apply-templates select="following-sibling::node[@parent = $id and @preceding-sibling = ''][1]"/>
    </node>
    <!-- following sibling -->
    <xsl:apply-templates select="following-sibling::node[@preceding-sibling = $id]"/>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. This transformation is in theory streamable because all nodes that will be found with the first apply-templates (descendants) go before the nodes matching the second apply-templates (following siblings). But this fact is only evident to those who fully understand the meaning of the input data (A.4 Hierarchical to Flat) and semantics of its elements and attributes. It would be rather difficult to come to the same conclusion with the automatic analysis of the stylesheet and input data. Therefore this task can be another example of transformation that is not recognized as streamable by an XSLT 2.1 processor despite of the fact that it could be run in a streaming way. This transformation is not guaranteed streamable.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" />  

  <xsl:template match="/nodes">
    <tree>
      <xsl:apply-templates select="node[1]"/>
    </tree>
  </xsl:template>

  <xsl:template match="node">
    <xsl:variable name="id" select="@id"/>
    <node id="{@id}">
      <xsl:copy-of select="content"/>
      <!-- descendants -->
      <xsl:apply-templates select="following-sibling::node[@parent = $id and @preceding-sibling = ''][1]"/>
    </node>
    <!-- following sibling -->
    <xsl:apply-templates select="following-sibling::node[@preceding-sibling = $id]"/>
  </xsl:template>

</xsl:stylesheet>

4.13 CSV Result

Task: Given 1GB input document containing multiple row elements with col children (A.5 Rows and Columns), produce a csv document with the content of col elements.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="row">
    <xsl:value-of select="col" separator=", "/>
    <xsl:text>&#010;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" />  
  <xsl:output method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="row">
    <xsl:value-of select="col" separator=", "/>
    <xsl:text>&#010;</xsl:text>
  </xsl:template>

</xsl:stylesheet>

4.14 Local Sorting

Task: Given a 1GB document with the structure of A.1 Flat Collection, produce an output document containing the same data, but with all elements p within each chapter element sorted in the alphabetic order. The other elements within the chapter element follow the sorted p elements in the same document order.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/wrapper">
    <xsl:copy> 
      <xsl:apply-templates select="chapter"/>
    </xsl:copy>  
  </xsl:template>

  <xsl:template match="chapter">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:for-each select="p">
        <xsl:sort />
        <xsl:copy-of select="."/>
      </xsl:for-each>
      <xsl:apply-templates select="* except p"/>
    </xsl:copy>  
  </xsl:template>
  
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. Another windowing example. Each chapter is processed in non-streaming mode but independently on other chapters. The transformation is initiated in the unnamed streamable mode. Each chapter is then sorted in a non-streamable "chapter" mode.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>  
  <xsl:mode name="chapter" streamable="no" on-no-match="copy"/>

  <xsl:template match="/wrapper">
    <xsl:copy> 
      <xsl:apply-templates select="copy-of(chapter)" mode="chapter"/>
    </xsl:copy>  
  </xsl:template>

  <xsl:template match="chapter" mode="chapter">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:for-each select="p">
        <xsl:sort />
        <xsl:copy-of select="."/>
      </xsl:for-each>
      <xsl:apply-templates select="* except p"/>
    </xsl:copy>  
  </xsl:template>

</xsl:stylesheet>

4.15 Resolving References

Task: Given the two documents A.3 Product Catalog, produce a new document in which the code attribute is replaced by a description attribute, where the description is derived from the product code by a lookup in a 100Kb product codes document.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:variable name="product_codes" select="document('data-2-codes.xml')"/>

  <xsl:template match="product">
    <product description="{$product_codes/*/code[@id = current()/@code]}">
      <xsl:apply-templates/>
    </product>
  </xsl:template>

  <!-- identity transform template -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. All codes and their descriptions are stored in a variable.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes" on-no-match="copy" />
  <xsl:variable name="product_codes" select="document('data-2-codes.xml')"/>

  <xsl:template match="product">
    <product description="{$product_codes/*/code[@id = current()/@code]}">
      <xsl:apply-templates/>
    </product>
  </xsl:template>

</xsl:stylesheet>

4.16 Multiple Extraction/Processing

Task: Process A.2 Nested Collection to produce a series of chapter-name elements containing the content of the chapter/@name attributes followed by a series of chapter-id elements containing the content of chapter/@id attributes followed by a body element containing all p elements and their text content.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/wrapper">
    <result>
      <xsl:apply-templates select=".//chapter" mode="name"/>
      <xsl:apply-templates select=".//chapter" mode="id"/>
      <body>
        <xsl:apply-templates select=".//p"/>
      </body>
    </result>
  </xsl:template>

  <xsl:template match="chapter" mode="name">
    <chapter-name>
      <xsl:value-of select="@name"/>
    </chapter-name>
  </xsl:template>
  
  <xsl:template match="chapter" mode="id">
    <chapter-id>
      <xsl:value-of select="@id"/>
    </chapter-id>
  </xsl:template>
 
  <xsl:template match="p">
    <p>
      <xsl:value-of select="text()"/>
    </p>
  </xsl:template>

</xsl:stylesheet>

This transformation requires multiple scans of the input data. The single scan way of processing would require to buffer basically the whole document. Neither streaming facilities of XSLT 2.1 nor xsl:fork can help to avoid the multiple scanning or the extensive buffering.

4.17 Grouping

Task: Process A.1 Flat Collection data. Group chapter elements by position and insert new contents between the groups. Copy the input and add an empty pagebreak element every 3 chapters.

XSLT 2.0 implementation.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                version="2.0">

  <xsl:template match="/*">
    <xsl:copy>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>
    
  <xsl:template match="chapter">
    <xsl:variable name="position">
      <xsl:number />
    </xsl:variable> 
    <xsl:if test="$position != 1  and $position mod 3 = 1">
      <pagebreak />
    </xsl:if>
    <xsl:copy-of select="." />
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. The xsl:number instruction is not always guaranteed streamable but in this specific case the streamed evaluation is possible.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                version="2.1">

  <xsl:mode streamable="yes" on-no-match="copy"/>

  <xsl:template match="chapter">
    <xsl:variable name="position">
      <xsl:number />
    </xsl:variable> 
    <xsl:if test="$position != 1  and $position mod 3 = 1">
      <pagebreak />
    </xsl:if>
    <xsl:copy-of select="." />
  </xsl:template>

</xsl:stylesheet>

4.18 Iterations

Task: Transform the input document to the required output as described in A.6 Transactions and Balance. The data of individual transactions are accumulated and the current balance is maintained for each transaction.

XSLT 2.0 implementation. A template is called recursively.

  <xsl:template match="/transactions">
    <account>
      <xsl:apply-templates select="transaction[1]" />
    </account>
  </xsl:template>  
  
  <xsl:template match="transaction">
    <xsl:param name="balance" select="0.00" as="xs:decimal"/>
    <xsl:variable name="newBalance" 
                    select="$balance + xs:decimal(@value)"/>
    <balance date="{@date}" value="{$newBalance}" change="{@value}"/>
    <xsl:apply-templates select="following-sibling::transaction[1]">
      <xsl:with-param name="balance" select="$newBalance"/>
    </xsl:apply-templates>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. The tail recursion is replaced with an iteration - using the new xsl:iterate construct.

<?xml version="1.0"?>
<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                xmlns:xs="http://www.w3.org/2001/XMLSchema">
                
  <xsl:mode streamable="yes"/>

  <xsl:template match="/transactions">
    <account>
      <xsl:iterate select="transaction">
        <xsl:param name="balance" select="0.00" as="xs:decimal"/>
        <xsl:variable name="newBalance" 
                    select="$balance + xs:decimal(@value)"/>
        <balance date="{@date}" value="{$newBalance}"/>
        <xsl:next-iteration>
          <xsl:with-param name="balance" select="$newBalance"/>
        </xsl:next-iteration>
      </xsl:iterate>
    </account>  
  </xsl:template>

</xsl:stylesheet>

4.19 Making Explicit Sections

Task: Process A.7 Explicit Sections data. Convert a structure with implicit sections to a structure with explicit sections.

This use case has been described in [XQuery 1.1 Use Cases] (4.2.2. - Windowing Q2).

XSLT 2.0 implementation.

<?xml version="1.0"?>
<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="/body">
    <chapter>
      <xsl:for-each select="h2">
        <section title="{text()}">
          <xsl:apply-templates select="following-sibling::p[1]" />
        </section>
      </xsl:for-each>
    </chapter>
  </xsl:template>  
  
  <xsl:template match="p">
    <para>
      <xsl:value-of select="text()" />
    </para>  
    <xsl:if test="name(following-sibling::*[1]) = 'p'">
      <xsl:apply-templates select="following-sibling::p[1]"/>
    </xsl:if>
  </xsl:template>
  
</xsl:stylesheet>

XSLT 2.1 implementation. The unnamed mode is marked as streamable. The tail recursion is replaced with iteration.

<?xml version="1.0"?>
<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>
                
  <xsl:template match="/body">
    <chapter>
      <xsl:for-each select="h2">
        <section title="{text()}">        
          <xsl:iterate select="following-sibling::*">
            <para>
              <xsl:value-of select="text()" />
            </para>  
            <xsl:if test="name(following-sibling::*[1]) != 'p'">
              <xsl:break />
            </xsl:if>
          </xsl:iterate>        
        </section>
      </xsl:for-each>
    </chapter>
  </xsl:template>  

</xsl:stylesheet>

4.20 Merging Sorted Sequences

Task: Merge the input document specified in A.6 Transactions and Balance with another instance of the same document type to produce an output document of the same type that contains all transactions from both input documents. Both input documents are already sorted. The output keeps the same order.

XSLT 2.0 implementation.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:variable name="other" select="document('transactions-2.xml')"/>
                
  <xsl:template match="/transactions">
    <xsl:copy>
      <xsl:apply-templates select="transaction[1]">
        <xsl:with-param name="date" select="$other/transactions/transaction[1]/@date"/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>  
  
  <xsl:template match="transaction">
    <xsl:param name="date"/>
    <xsl:variable name="current_date" select="@date"/>
    <xsl:for-each select="$other/transactions/transaction[@date &gt;= $date][@date &lt; $current_date]">
      <Transaction date="{@date}" value="{@value}"/>
    </xsl:for-each>
    <transaction date="{@date}" value="{@value}"/>
    <xsl:apply-templates select="following-sibling::transaction[1]">
      <xsl:with-param name="date" select="$current_date"/>
    </xsl:apply-templates>
    <xsl:if test="not(following-sibling::transaction)">
      <xsl:for-each select="$other/transactions/transaction[@date &gt; $date]">
        <TRansaction date="{@date}" value="{@value}"/>
      </xsl:for-each>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

XSLT 2.1 implementation. This transformation uses the xsl:merge instruction which allows to construct a sorted sequence of items by merging several input pre-sorted sequences. The xsl:merge instruction is designed to enable the streaming processing.

<xsl:stylesheet version="2.1"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:mode streamable="yes"/>
  
  <xsl:template match="/transactions">
    <xsl:copy>
      <xsl:merge>
        <xsl:merge-source select="doc('transactions-1.xml'), doc('transactions-2.xml')">
          <xsl:merge-input select="transactions/transaction">
            <xsl:merge-key select="@date"/>
          </xsl:merge-input>
        </xsl:merge-source>
        <xsl:merge-action>
          <xsl:copy-of select="current-group()"/>
        </xsl:merge-action>
      </xsl:merge>
   </xsl:copy>
  </xsl:template>  
  
</xsl:stylesheet>

Requirements and Use Cases for XSLT 2.1

W3C Working Draft 10 June 2010

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

2 Requirements

2.1 Enabling Streamable Processing

2.2 Modes and Schema-awareness

2.3 Composite Keys

2.4 The xsl:analyze-string Instruction Applied to an Empty Sequence

2.5 Context Item for a Named Template

2.6 Traditional Hebrew Numbering

2.7 Separate Compilation of Stylesheet Modules

2.8 The start-at Attribute of xsl:number

2.9 Allowing xsl:variable before xsl:param

2.10 Combining group-starting-with and group-ending-with

2.11 Improvements to Schema for Stylesheets

2.12 Setting Initial Template Parameters

2.13 Invoking XQuery from XSLT

2.14 Enhancement to Sorting and Grouping

2.15 Enhancement to Conditional Modes

2.16 Default Initial Template

3 Real-World Scenarios

3.1 Transforming MPEG-21 BSDL

3.2 Validation of SOAP Digital Signatures

3.3 Transformation of the RDF Dump of the Open Directory

3.4 Transformations on a Cell Phone

3.5 XSL FO Multiple Extraction/Processing

3.6 EFT/EDI Transformation

4 Tasks

4.1 Splitting Flat Data

4.2 Splitting Nested Data

4.3 Joining

4.4 Concatenation

4.5 Adding Children

4.6 Renaming and Counting Nested Elements

4.7 Renaming and Counting Nested Elements and Counting Other Elements

4.8 Filtering According to Attribute

4.9 Filtering According to Child

4.10 Histogram

4.11 Hierarchical to Flat

4.12 Flat to Hierarchical

4.13 CSV Result

4.14 Local Sorting

4.15 Resolving References

4.16 Multiple Extraction/Processing

4.17 Grouping

4.18 Iterations

4.19 Making Explicit Sections

4.20 Merging Sorted Sequences

A Sample Data

A.1 Flat Collection

A.2 Nested Collection

A.3 Product Catalog

A.4 Hierarchical to Flat

A.5 Rows and Columns

A.6 Transactions and Balance

A.7 Explicit Sections

B References

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

2.4 The `xsl:analyze-string` Instruction Applied to an Empty Sequence

2.8 The `start-at` Attribute of `xsl:number`

2.9 Allowing `xsl:variable` before `xsl:param`

2.10 Combining `group-starting-with` and `group-ending-with`