Hi all,
I'm evaluating BaseX as an alternative (and very attractive) platform for an XML/XSLT-based website that needs to be migrated from ASP.NET.
The website relies heavily on XSLT. Each page is generated on-the-fly with Saxon.NET, using a complex set of stylesheets. To get reasonable performance, stylesheets are compiled on first use and cached for subsequent requests.
This is crucial, as XSLT compilation is typically orders of magnitude slower than execution; without caching, the server would spend most of the time compiling the same stylesheets over and over again.
I was happy to find that BaseX can use Saxon, but as far as I can see, xslt:transform() does not cache compiled stylesheets. Can anyone confirm this?
If not, are there any plans to support stylesheet caching in the future?
Or is there a way to reuse compiled stylesheets manually?
Thanks, Tom De Herdt
Hi Tom,
You are right. xslt:transform() does nothing else than sending stylesheets to the registered XSLT processor (which is usually Xalan or Saxon).
The XQFO 3.1 spec [1] will provide an fn:transform function that provides a "cache" option. As the definition of this function is very Saxon-specific, I am not sure if we will completely support it in future. For now, if you know how caching is enabled in Saxon, feel free to provide me with some example code, and I will see if I can easily embed it in our current architecture.
Cheers, Christian
[1] https://www.w3.org/TR/xpath-functions-31/#func-transform
On Sun, Feb 5, 2017 at 1:52 AM, Tom De Herdt tom.deherdt@skynet.be wrote:
Hi all,
I'm evaluating BaseX as an alternative (and very attractive) platform for an XML/XSLT-based website that needs to be migrated from ASP.NET.
The website relies heavily on XSLT. Each page is generated on-the-fly with Saxon.NET, using a complex set of stylesheets. To get reasonable performance, stylesheets are compiled on first use and cached for subsequent requests.
This is crucial, as XSLT compilation is typically orders of magnitude slower than execution; without caching, the server would spend most of the time compiling the same stylesheets over and over again.
I was happy to find that BaseX can use Saxon, but as far as I can see, xslt:transform() does not cache compiled stylesheets. Can anyone confirm this?
If not, are there any plans to support stylesheet caching in the future?
Or is there a way to reuse compiled stylesheets manually?
Thanks, Tom De Herdt
Hi Christian,
I'm not a Java programmer, but Eric Burke: "Java and XSLT" (O'reilly, 2001) has a chapter on stylesheet compilation and caching using JAXP, with actual code: see section 5.4.2. "A Stylesheet Cache". (I found the text online. Not sure about the legal status, so I won't post links here, but you can easily find that section if you want.)
Here is another article on caching with JAXP, with code samples: http://www.javaworld.com/article/2073394/java-xml/transparently-cache-xsl-tr...
If you don't mind me referring to another open-source product, eXist-db seems to cache XSLT stylesheets. (In any case they have a caching-flag in the transformer section of their config file.)
As far as I can tell, eXist uses JAXP (or javax.xml.transform.*), so this might be similar to BaseX' implementation?
Their code has a CachedStylesheet class: https://github.com/eXist-db/exist/blob/develop/src/org/exist/xquery/function...
On the other hand, Burke's book and the article mentioned above date back quite a while and JAXP may no longer be the best way to handle transformations, not for Saxon anyway: http://www.saxonica.com/html/documentation/using%2Dxsl/embedding
Saxon has a new API (s9api) that is a better fit for XSLT 2.0 and higher: http://www.saxonica.com/html/documentation/using-xsl/embedding/s9api-transfo...
The page outlines how you create an XsltCompiler and "call the compile() method to compile a stylesheet. The result is an XsltExecutable, which can be used as often as you like, in the same thread or in different threads."
In the ASP.NET/Saxon project that I would like to migrate, we cache this XsltExecutable object.
Web requests get the XsltExecutable from cache (or create and cache it on a cache miss), and call the Load() method to create an XsltTransformer that executes the transformation.
When the XSLT stylesheets are modified on disk, the cache is cleared, so we can edit them on the fly.
I suppose there are various ways to do this automatically. Tracking individual stylesheet files is not trivial, since they may import other files.
Our stylesheets have a lot of import inheritance, so we simply watch the entire stylesheet folder and clear the cache on any change. The next request will take a second or so to compile whatever stylesheets are needed, but that's OK.
If it is hard to come up with a generic cache clearing strategy, a custom XQuery function that clears the cache would be good enough (for a manual reset after editing).
While looking at the BaseX code and trying to transpose my .NET background to Java, I briefly toyed with the idea of trying to implement a custom XQuery function myself, but I had to give up pretty soon.
One of the things that puzzles me is where to cache the XsltExecutables. Is there some sort of global context where these objects could be stored?
(ASP.NET provides an Application object with caching facilities that lives above Sessions and is available to the entire web application. Is there something similar in Java, accessible from RESTXQ in BaseX?
Not sure, but this subject may touch on the Static context thread that came up a few days ago on the list?)
So, in short, would one of these options be feasible:
1. basic XSLT caching with the existing JAXP interface, as described in the articles or similar; 2. specific saxon:transform() etc. functions that use the new Saxon interface (and do caching); 3. idem but implemented for the regular xslt:transform(), or maybe the function in XQFO 3.1 (thanks for the link, I was not aware of this)?
Thinking forward, absolutely wonderful would be some form of tight integration with Saxon that passes nodes from BaseX to Saxon directly, without serializing/parsing.
Incidentally, there is an interesting note on this topic on the eXist developer platform (scroll to the bottom): https://github.com/eXist-db/exist/issues/791
But any of option 1-3 (or similar) would do the trick and be great!
Best regards, Tom
On 5/02/2017 15:01, Christian Grün wrote:
Hi Tom,
You are right. xslt:transform() does nothing else than sending stylesheets to the registered XSLT processor (which is usually Xalan or Saxon).
The XQFO 3.1 spec [1] will provide an fn:transform function that provides a "cache" option. As the definition of this function is very Saxon-specific, I am not sure if we will completely support it in future. For now, if you know how caching is enabled in Saxon, feel free to provide me with some example code, and I will see if I can easily embed it in our current architecture.
Cheers, Christian
[1] https://www.w3.org/TR/xpath-functions-31/#func-transform
On Sun, Feb 5, 2017 at 1:52 AM, Tom De Herdt tom.deherdt@skynet.be wrote:
Hi all,
I'm evaluating BaseX as an alternative (and very attractive) platform for an XML/XSLT-based website that needs to be migrated from ASP.NET.
The website relies heavily on XSLT. Each page is generated on-the-fly with Saxon.NET, using a complex set of stylesheets. To get reasonable performance, stylesheets are compiled on first use and cached for subsequent requests.
This is crucial, as XSLT compilation is typically orders of magnitude slower than execution; without caching, the server would spend most of the time compiling the same stylesheets over and over again.
I was happy to find that BaseX can use Saxon, but as far as I can see, xslt:transform() does not cache compiled stylesheets. Can anyone confirm this?
If not, are there any plans to support stylesheet caching in the future?
Or is there a way to reuse compiled stylesheets manually?
Thanks, Tom De Herdt
Tom,
Thanks for the excellent summary on what could be done, very appreciated!
- basic XSLT caching with the existing JAXP interface, as described in the
articles or similar; 2. specific saxon:transform() etc. functions that use the new Saxon interface (and do caching); 3. idem but implemented for the regular xslt:transform(), or maybe the function in XQFO 3.1 (thanks for the link, I was not aware of this)?
Variant 1 is surely something that I can easily include. I will check out your links and give you some update this week.
Talking about a tighter integration, I fully agree with Adam’s comments:
• Switching to to the Saxon’s API would be a reasonable choice. We still have users who work with standard Xalan XSLT, but we could definitely use Michael Kay’s s9api whenever Saxon is found in the classpath. I have added an issue to our GitHub tracker [1].
• Similar to eXist, the BaseX DOM models are pretty different from Saxon’s representation. In BaseX, it is possible to ceate a standard Java DOM representation for arbitrary XML nodes, but I doubt that working it will be much faster than serializing nodes, because the latter option is usually very fast in BaseX.
Cheers, Christian
[1] https://github.com/BaseXdb/basex/issues/1408
Thinking forward, absolutely wonderful would be some form of tight integration with Saxon that passes nodes from BaseX to Saxon directly, without serializing/parsing.
Incidentally, there is an interesting note on this topic on the eXist developer platform (scroll to the bottom): https://github.com/eXist-db/exist/issues/791
But any of option 1-3 (or similar) would do the trick and be great!
Best regards, Tom
On 5/02/2017 15:01, Christian Grün wrote:
Hi Tom,
You are right. xslt:transform() does nothing else than sending stylesheets to the registered XSLT processor (which is usually Xalan or Saxon).
The XQFO 3.1 spec [1] will provide an fn:transform function that provides a "cache" option. As the definition of this function is very Saxon-specific, I am not sure if we will completely support it in future. For now, if you know how caching is enabled in Saxon, feel free to provide me with some example code, and I will see if I can easily embed it in our current architecture.
Cheers, Christian
[1] https://www.w3.org/TR/xpath-functions-31/#func-transform
On Sun, Feb 5, 2017 at 1:52 AM, Tom De Herdt tom.deherdt@skynet.be wrote:
Hi all,
I'm evaluating BaseX as an alternative (and very attractive) platform for an XML/XSLT-based website that needs to be migrated from ASP.NET.
The website relies heavily on XSLT. Each page is generated on-the-fly with Saxon.NET, using a complex set of stylesheets. To get reasonable performance, stylesheets are compiled on first use and cached for subsequent requests.
This is crucial, as XSLT compilation is typically orders of magnitude slower than execution; without caching, the server would spend most of the time compiling the same stylesheets over and over again.
I was happy to find that BaseX can use Saxon, but as far as I can see, xslt:transform() does not cache compiled stylesheets. Can anyone confirm this?
If not, are there any plans to support stylesheet caching in the future?
Or is there a way to reuse compiled stylesheets manually?
Thanks, Tom De Herdt
Hi Christian,
Thank you for taking time to look into this!
• Similar to eXist, the BaseX DOM models are pretty different from Saxon’s representation. In BaseX, it is possible to ceate a standard Java DOM representation for arbitrary XML nodes, but I doubt that working it will be much faster than serializing nodes, because the latter option is usually very fast in BaseX.
OK, I understand. You're right, it probably wouldn't be faster. In any case, serializing/deserializing transformation input (typically small pages) is never going to be a bottleneck in a web context, so it doesn't matter. Xslt compilation on the other hand does incur a noticeable cost if it is repeated for each request.
Regards, Tom
On 6/02/2017 14:08, Christian Grün wrote:
Tom,
Thanks for the excellent summary on what could be done, very appreciated!
- basic XSLT caching with the existing JAXP interface, as described in the
articles or similar; 2. specific saxon:transform() etc. functions that use the new Saxon interface (and do caching); 3. idem but implemented for the regular xslt:transform(), or maybe the function in XQFO 3.1 (thanks for the link, I was not aware of this)?
Variant 1 is surely something that I can easily include. I will check out your links and give you some update this week.
Talking about a tighter integration, I fully agree with Adam’s comments:
• Switching to to the Saxon’s API would be a reasonable choice. We still have users who work with standard Xalan XSLT, but we could definitely use Michael Kay’s s9api whenever Saxon is found in the classpath. I have added an issue to our GitHub tracker [1].
• Similar to eXist, the BaseX DOM models are pretty different from Saxon’s representation. In BaseX, it is possible to ceate a standard Java DOM representation for arbitrary XML nodes, but I doubt that working it will be much faster than serializing nodes, because the latter option is usually very fast in BaseX.
Cheers, Christian
[1] https://github.com/BaseXdb/basex/issues/1408
Thinking forward, absolutely wonderful would be some form of tight integration with Saxon that passes nodes from BaseX to Saxon directly, without serializing/parsing.
Incidentally, there is an interesting note on this topic on the eXist developer platform (scroll to the bottom): https://github.com/eXist-db/exist/issues/791
But any of option 1-3 (or similar) would do the trick and be great!
Best regards, Tom
On 5/02/2017 15:01, Christian Grün wrote:
Hi Tom,
You are right. xslt:transform() does nothing else than sending stylesheets to the registered XSLT processor (which is usually Xalan or Saxon).
The XQFO 3.1 spec [1] will provide an fn:transform function that provides a "cache" option. As the definition of this function is very Saxon-specific, I am not sure if we will completely support it in future. For now, if you know how caching is enabled in Saxon, feel free to provide me with some example code, and I will see if I can easily embed it in our current architecture.
Cheers, Christian
[1] https://www.w3.org/TR/xpath-functions-31/#func-transform
On Sun, Feb 5, 2017 at 1:52 AM, Tom De Herdt tom.deherdt@skynet.be wrote:
Hi all,
I'm evaluating BaseX as an alternative (and very attractive) platform for an XML/XSLT-based website that needs to be migrated from ASP.NET.
The website relies heavily on XSLT. Each page is generated on-the-fly with Saxon.NET, using a complex set of stylesheets. To get reasonable performance, stylesheets are compiled on first use and cached for subsequent requests.
This is crucial, as XSLT compilation is typically orders of magnitude slower than execution; without caching, the server would spend most of the time compiling the same stylesheets over and over again.
I was happy to find that BaseX can use Saxon, but as far as I can see, xslt:transform() does not cache compiled stylesheets. Can anyone confirm this?
If not, are there any plans to support stylesheet caching in the future?
Or is there a way to reuse compiled stylesheets manually?
Thanks, Tom De Herdt
Hi Tom,
I have integrated some experimental support for JAXP stylesheet caching (all subject to discussion, and subjejct to change):
• I have added a fourth argument for xslt:transform(), which defines if stylesheets will be cached • The stylesheet argument in BaseX can reference nodes, strings, and URIs. For now, I decided to limit the caching facility to URIs. • The cache can be invalidated via xslt:init().
In the attached query example, the cached transformation of a very basic stylesheet is around 3 times faster.
A new snapshot is online [1]. I would be grateful if you could do some testing, and give me feedback if the chosen solution reasonably speeds up your transformations.
Christian
[1] http://files.basex.org/releases/latest/
_ query.xq ___
xslt:init(), let $style := 'xslt.xslt' for $cache in (true(), false()) return prof:time( for $x in 1 to 1000 return xslt:transform(<input/>, $style, (), map { 'cache': $cache }) , false(), "Caching " || $cache || ": ")
_ xslt.xslt ___
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform%27%3E <xsl:template match="/"><result/></xsl:template> </xsl:stylesheet>
On Mon, Feb 6, 2017 at 3:28 PM, Tom De Herdt tom.deherdt@skynet.be wrote:
Hi Christian,
Thank you for taking time to look into this!
• Similar to eXist, the BaseX DOM models are pretty different from Saxon’s representation. In BaseX, it is possible to ceate a standard Java DOM representation for arbitrary XML nodes, but I doubt that working it will be much faster than serializing nodes, because the latter option is usually very fast in BaseX.
OK, I understand. You're right, it probably wouldn't be faster. In any case, serializing/deserializing transformation input (typically small pages) is never going to be a bottleneck in a web context, so it doesn't matter. Xslt compilation on the other hand does incur a noticeable cost if it is repeated for each request.
Regards, Tom
On 6/02/2017 14:08, Christian Grün wrote:
Tom,
Thanks for the excellent summary on what could be done, very appreciated!
- basic XSLT caching with the existing JAXP interface, as described in
the articles or similar; 2. specific saxon:transform() etc. functions that use the new Saxon interface (and do caching); 3. idem but implemented for the regular xslt:transform(), or maybe the function in XQFO 3.1 (thanks for the link, I was not aware of this)?
Variant 1 is surely something that I can easily include. I will check out your links and give you some update this week.
Talking about a tighter integration, I fully agree with Adam’s comments:
• Switching to to the Saxon’s API would be a reasonable choice. We still have users who work with standard Xalan XSLT, but we could definitely use Michael Kay’s s9api whenever Saxon is found in the classpath. I have added an issue to our GitHub tracker [1].
• Similar to eXist, the BaseX DOM models are pretty different from Saxon’s representation. In BaseX, it is possible to ceate a standard Java DOM representation for arbitrary XML nodes, but I doubt that working it will be much faster than serializing nodes, because the latter option is usually very fast in BaseX.
Cheers, Christian
[1] https://github.com/BaseXdb/basex/issues/1408
Thinking forward, absolutely wonderful would be some form of tight integration with Saxon that passes nodes from BaseX to Saxon directly, without serializing/parsing.
Incidentally, there is an interesting note on this topic on the eXist developer platform (scroll to the bottom): https://github.com/eXist-db/exist/issues/791
But any of option 1-3 (or similar) would do the trick and be great!
Best regards, Tom
On 5/02/2017 15:01, Christian Grün wrote:
Hi Tom,
You are right. xslt:transform() does nothing else than sending stylesheets to the registered XSLT processor (which is usually Xalan or Saxon).
The XQFO 3.1 spec [1] will provide an fn:transform function that provides a "cache" option. As the definition of this function is very Saxon-specific, I am not sure if we will completely support it in future. For now, if you know how caching is enabled in Saxon, feel free to provide me with some example code, and I will see if I can easily embed it in our current architecture.
Cheers, Christian
[1] https://www.w3.org/TR/xpath-functions-31/#func-transform
On Sun, Feb 5, 2017 at 1:52 AM, Tom De Herdt tom.deherdt@skynet.be wrote:
Hi all,
I'm evaluating BaseX as an alternative (and very attractive) platform for an XML/XSLT-based website that needs to be migrated from ASP.NET.
The website relies heavily on XSLT. Each page is generated on-the-fly with Saxon.NET, using a complex set of stylesheets. To get reasonable performance, stylesheets are compiled on first use and cached for subsequent requests.
This is crucial, as XSLT compilation is typically orders of magnitude slower than execution; without caching, the server would spend most of the time compiling the same stylesheets over and over again.
I was happy to find that BaseX can use Saxon, but as far as I can see, xslt:transform() does not cache compiled stylesheets. Can anyone confirm this?
If not, are there any plans to support stylesheet caching in the future?
Or is there a way to reuse compiled stylesheets manually?
Thanks, Tom De Herdt
Hi Christian,
It took some time (as explained in an off-list e-mail), but I finally managed to test your experimental support for JAXP stylesheet caching in snapshot 8.6.1.
I have to say I'm very impressed. It works as expected. I tried various transformations, both in the GUI and with RESTXQ, and did not have a single problem.
To get some idea of real-world performance gains, I set up a small RESTXQ page that calls a relatively complex set of xslt 2.0 stylesheets borrowed from an internal CMS application.
The stylesheets transform TEI-like documents to html and add common website elements (header, footer, menu ...) to the page. They are designed in a modular way, so there's quite a bit of import inheritance going on.
In order to somehow measure real-life use, I used a BaseX installation (GUI) on my laptop to query the RESTXQ page on a server in the local network. The XQuery script [1] simply does a number of requests for different pages, repeating the series three times, requesting: - (1) raw xml documents, without xslt transformation; - (2) html generated with cached xslt; - (3) html generated with xslt without stylesheet caching.
To make sure that documents are actually fetched, the script counts the total number of characters received.
Typical results for 100 requests (from the Query Info pane):
Evaluating: XML source: 382.1 ms XSLT with caching: 711.05 ms XSLT without caching: 2486.53 ms
Evaluating: XML source: 449.66 ms XSLT with caching: 806.66 ms XSLT without caching: 2605.8 ms
Evaluating: XML source: 356.65 ms XSLT with caching: 744.69 ms XSLT without caching: 2580.29 ms
When running the script directly on the server, response time is obviously faster, but the ratio is more or less the same:
Evaluating: XML source: 282.46 ms XSLT with caching: 542.88 ms XSLT without caching: 1873.05 ms
Evaluating: XML source: 249.97 ms XSLT with caching: 492.76 ms XSLT without caching: 1703.14 ms
Evaluating: XML source: 281.98 ms XSLT with caching: 481.52 ms XSLT without caching: 1750.14 ms
I also adapted your test script to test the stylesheets in BaseX GUI on the server [2], measuring the difference without network/RESTXQ overhead (again series of 100 transforms):
Evaluating: Caching true: 343.3 ms Caching false: 1700.72 ms
Evaluating: Caching true: 329.14 ms Caching false: 1670.83 ms
Evaluating: Caching true: 277.98 ms Caching false: 1612.66 ms
Evaluating: Caching true: 316.73 ms Caching false: 1610.37 ms
All in all, caching stylesheets is about 3 to 4 times faster, similar to what you found. A marked difference, as expected, but not huge. Maybe non-cached xslt transformations still benefit from some form of processor-level caching when called in a series of requests...? Initial loading times (after starting BaseX) are slower, but it quickly gets up to full speed after a few requests.
So is it worth it?
I definitely think it is.
In isolation the difference is small: say 7 ms vs. 25 ms for a single page. You wouldn't notice that over the Internet, but you might when the page generates several AJAX requests. In any case, it reduces load on the server, which could make a difference for websites with heavy traffic.
Not many developers would recommend XSLT for high-profile sites anyway, I suppose, but I was actually surprised by the performance: 7 ms is quite good. (Certainly faster than the 30 to 40 ms the stylesheets take with our current ASP.NET/SQL/Saxon implementation on the same server -- cached...)
Best regards, Tom
NOTE: the scripts I used. Let me know if there is some methodological flaw. I can send you the stylesheets and some sample data off-list if you want.
=== script [1] ===
let $count := 100 let $host := "http://192.168.115.101:8984" let $list := fetch:xml($host||"/list"||"?count="||$count) (: list of $count identifiers :) let $url := $host||"/egon/" return <results> { prof:time( sum( for $id in $list//entry let $page := fetch:text($url||$id||"?xml=true") return string-length($page) ), false(),'XML source: ' ), prof:time( sum( for $id in $list//entry let $page := fetch:text($url||$id||"?cache=true") return string-length($page) ), false(),'XSLT with caching: ' ), prof:time( sum( for $id in $list//entry let $page := fetch:text($url||$id||"?cache=false") return string-length($page) ), false(),'XSLT without caching: ' ) } </results>
=== script [2] ===
let $count := 100 let $xslt := "../static/vorm/xsl/website.browse.xsl" let $input := doc('egon/logboek.xml')/export/entry[@id="D20081220"]
for $cache in (true(), false()) return prof:time( for $x in 1 to $count return xslt:transform($input, $xslt, (), map { "cache": $cache} ), false(), "Caching " || $cache || ": ")
On 9/02/2017 13:26, Christian Grün wrote:
Hi Tom,
I have integrated some experimental support for JAXP stylesheet caching (all subject to discussion, and subjejct to change):
• I have added a fourth argument for xslt:transform(), which defines if stylesheets will be cached • The stylesheet argument in BaseX can reference nodes, strings, and URIs. For now, I decided to limit the caching facility to URIs. • The cache can be invalidated via xslt:init().
In the attached query example, the cached transformation of a very basic stylesheet is around 3 times faster.
A new snapshot is online [1]. I would be grateful if you could do some testing, and give me feedback if the chosen solution reasonably speeds up your transformations.
Christian
[1] http://files.basex.org/releases/latest/
_ query.xq ___
xslt:init(), let $style := 'xslt.xslt' for $cache in (true(), false()) return prof:time( for $x in 1 to 1000 return xslt:transform(<input/>, $style, (), map { 'cache': $cache }) , false(), "Caching " || $cache || ": ")
_ xslt.xslt ___
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform%27%3E <xsl:template match="/"><result/></xsl:template> </xsl:stylesheet>
On Mon, Feb 6, 2017 at 3:28 PM, Tom De Herdt tom.deherdt@skynet.be wrote:
Hi Christian,
Thank you for taking time to look into this!
• Similar to eXist, the BaseX DOM models are pretty different from Saxon’s representation. In BaseX, it is possible to ceate a standard Java DOM representation for arbitrary XML nodes, but I doubt that working it will be much faster than serializing nodes, because the latter option is usually very fast in BaseX.
OK, I understand. You're right, it probably wouldn't be faster. In any case, serializing/deserializing transformation input (typically small pages) is never going to be a bottleneck in a web context, so it doesn't matter. Xslt compilation on the other hand does incur a noticeable cost if it is repeated for each request.
Regards, Tom
On 6/02/2017 14:08, Christian Grün wrote:
Tom,
Thanks for the excellent summary on what could be done, very appreciated!
- basic XSLT caching with the existing JAXP interface, as described in
the articles or similar; 2. specific saxon:transform() etc. functions that use the new Saxon interface (and do caching); 3. idem but implemented for the regular xslt:transform(), or maybe the function in XQFO 3.1 (thanks for the link, I was not aware of this)?
Variant 1 is surely something that I can easily include. I will check out your links and give you some update this week.
Talking about a tighter integration, I fully agree with Adam’s comments:
• Switching to to the Saxon’s API would be a reasonable choice. We still have users who work with standard Xalan XSLT, but we could definitely use Michael Kay’s s9api whenever Saxon is found in the classpath. I have added an issue to our GitHub tracker [1].
• Similar to eXist, the BaseX DOM models are pretty different from Saxon’s representation. In BaseX, it is possible to ceate a standard Java DOM representation for arbitrary XML nodes, but I doubt that working it will be much faster than serializing nodes, because the latter option is usually very fast in BaseX.
Cheers, Christian
[1] https://github.com/BaseXdb/basex/issues/1408
Thinking forward, absolutely wonderful would be some form of tight integration with Saxon that passes nodes from BaseX to Saxon directly, without serializing/parsing.
Incidentally, there is an interesting note on this topic on the eXist developer platform (scroll to the bottom): https://github.com/eXist-db/exist/issues/791
But any of option 1-3 (or similar) would do the trick and be great!
Best regards, Tom
On 5/02/2017 15:01, Christian Grün wrote:
Hi Tom,
You are right. xslt:transform() does nothing else than sending stylesheets to the registered XSLT processor (which is usually Xalan or Saxon).
The XQFO 3.1 spec [1] will provide an fn:transform function that provides a "cache" option. As the definition of this function is very Saxon-specific, I am not sure if we will completely support it in future. For now, if you know how caching is enabled in Saxon, feel free to provide me with some example code, and I will see if I can easily embed it in our current architecture.
Cheers, Christian
[1] https://www.w3.org/TR/xpath-functions-31/#func-transform
On Sun, Feb 5, 2017 at 1:52 AM, Tom De Herdt tom.deherdt@skynet.be wrote:
Hi all,
I'm evaluating BaseX as an alternative (and very attractive) platform for an XML/XSLT-based website that needs to be migrated from ASP.NET.
The website relies heavily on XSLT. Each page is generated on-the-fly with Saxon.NET, using a complex set of stylesheets. To get reasonable performance, stylesheets are compiled on first use and cached for subsequent requests.
This is crucial, as XSLT compilation is typically orders of magnitude slower than execution; without caching, the server would spend most of the time compiling the same stylesheets over and over again.
I was happy to find that BaseX can use Saxon, but as far as I can see, xslt:transform() does not cache compiled stylesheets. Can anyone confirm this?
If not, are there any plans to support stylesheet caching in the future?
Or is there a way to reuse compiled stylesheets manually?
Thanks, Tom De Herdt
Hi Tom,
Thanks for passing on your text results. I am glad to hear that the results seem to be satisfactory, so I will keep this extension in BaseX 8.6.1 (which is still to be released, hopefully until end of next week). I’m still not sure if I should stick with the explicit caching mechanism, or switch to a more dynamic approach (like automatically caching most recent stylesheets, and dropping older ones), so I will wait some time before I will officially document the enhancements in our Wiki.
It could also be interesting to find out how much time we would save by integrating s9api more tighlty. If you decide to do to any experiments in that direction, feel free to report back to us!
All the best, Christian
On Sun, Feb 19, 2017 at 4:45 AM, Tom De Herdt tom.deherdt@skynet.be wrote:
Hi Christian,
It took some time (as explained in an off-list e-mail), but I finally managed to test your experimental support for JAXP stylesheet caching in snapshot 8.6.1.
I have to say I'm very impressed. It works as expected. I tried various transformations, both in the GUI and with RESTXQ, and did not have a single problem.
To get some idea of real-world performance gains, I set up a small RESTXQ page that calls a relatively complex set of xslt 2.0 stylesheets borrowed from an internal CMS application.
The stylesheets transform TEI-like documents to html and add common website elements (header, footer, menu ...) to the page. They are designed in a modular way, so there's quite a bit of import inheritance going on.
In order to somehow measure real-life use, I used a BaseX installation (GUI) on my laptop to query the RESTXQ page on a server in the local network. The XQuery script [1] simply does a number of requests for different pages, repeating the series three times, requesting:
- (1) raw xml documents, without xslt transformation;
- (2) html generated with cached xslt;
- (3) html generated with xslt without stylesheet caching.
To make sure that documents are actually fetched, the script counts the total number of characters received.
Typical results for 100 requests (from the Query Info pane):
Evaluating: XML source: 382.1 ms XSLT with caching: 711.05 ms XSLT without caching: 2486.53 ms
Evaluating: XML source: 449.66 ms XSLT with caching: 806.66 ms XSLT without caching: 2605.8 ms
Evaluating: XML source: 356.65 ms XSLT with caching: 744.69 ms XSLT without caching: 2580.29 ms
When running the script directly on the server, response time is obviously faster, but the ratio is more or less the same:
Evaluating: XML source: 282.46 ms XSLT with caching: 542.88 ms XSLT without caching: 1873.05 ms
Evaluating: XML source: 249.97 ms XSLT with caching: 492.76 ms XSLT without caching: 1703.14 ms
Evaluating: XML source: 281.98 ms XSLT with caching: 481.52 ms XSLT without caching: 1750.14 ms
I also adapted your test script to test the stylesheets in BaseX GUI on the server [2], measuring the difference without network/RESTXQ overhead (again series of 100 transforms):
Evaluating: Caching true: 343.3 ms Caching false: 1700.72 ms
Evaluating: Caching true: 329.14 ms Caching false: 1670.83 ms
Evaluating: Caching true: 277.98 ms Caching false: 1612.66 ms
Evaluating: Caching true: 316.73 ms Caching false: 1610.37 ms
All in all, caching stylesheets is about 3 to 4 times faster, similar to what you found. A marked difference, as expected, but not huge. Maybe non-cached xslt transformations still benefit from some form of processor-level caching when called in a series of requests...? Initial loading times (after starting BaseX) are slower, but it quickly gets up to full speed after a few requests.
So is it worth it?
I definitely think it is.
In isolation the difference is small: say 7 ms vs. 25 ms for a single page. You wouldn't notice that over the Internet, but you might when the page generates several AJAX requests. In any case, it reduces load on the server, which could make a difference for websites with heavy traffic.
Not many developers would recommend XSLT for high-profile sites anyway, I suppose, but I was actually surprised by the performance: 7 ms is quite good. (Certainly faster than the 30 to 40 ms the stylesheets take with our current ASP.NET/SQL/Saxon implementation on the same server -- cached...)
Best regards, Tom
NOTE: the scripts I used. Let me know if there is some methodological flaw. I can send you the stylesheets and some sample data off-list if you want.
=== script [1] ===
let $count := 100 let $host := "http://192.168.115.101:8984" let $list := fetch:xml($host||"/list"||"?count="||$count) (: list of $count identifiers :) let $url := $host||"/egon/" return
<results> { prof:time( sum( for $id in $list//entry let $page := fetch:text($url||$id||"?xml=true") return string-length($page) ), false(),'XML source: ' ), prof:time( sum( for $id in $list//entry let $page := fetch:text($url||$id||"?cache=true") return string-length($page) ), false(),'XSLT with caching: ' ), prof:time( sum( for $id in $list//entry let $page := fetch:text($url||$id||"?cache=false") return string-length($page) ), false(),'XSLT without caching: ' ) } </results>
=== script [2] ===
let $count := 100 let $xslt := "../static/vorm/xsl/website.browse.xsl" let $input := doc('egon/logboek.xml')/export/entry[@id="D20081220"]
for $cache in (true(), false()) return prof:time( for $x in 1 to $count return xslt:transform($input, $xslt, (), map { "cache": $cache} ), false(), "Caching " || $cache || ": ")
On 9/02/2017 13:26, Christian Grün wrote:
Hi Tom,
I have integrated some experimental support for JAXP stylesheet caching (all subject to discussion, and subjejct to change):
• I have added a fourth argument for xslt:transform(), which defines if stylesheets will be cached • The stylesheet argument in BaseX can reference nodes, strings, and URIs. For now, I decided to limit the caching facility to URIs. • The cache can be invalidated via xslt:init().
In the attached query example, the cached transformation of a very basic stylesheet is around 3 times faster.
A new snapshot is online [1]. I would be grateful if you could do some testing, and give me feedback if the chosen solution reasonably speeds up your transformations.
Christian
[1] http://files.basex.org/releases/latest/
_ query.xq ___
xslt:init(), let $style := 'xslt.xslt' for $cache in (true(), false()) return prof:time( for $x in 1 to 1000 return xslt:transform(<input/>, $style, (), map { 'cache': $cache }) , false(), "Caching " || $cache || ": ")
_ xslt.xslt ___
<xsl:stylesheet version='1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform%27%3E <xsl:template match="/"><result/></xsl:template> </xsl:stylesheet>
On Mon, Feb 6, 2017 at 3:28 PM, Tom De Herdt tom.deherdt@skynet.be wrote:
Hi Christian,
Thank you for taking time to look into this!
• Similar to eXist, the BaseX DOM models are pretty different from Saxon’s representation. In BaseX, it is possible to ceate a standard Java DOM representation for arbitrary XML nodes, but I doubt that working it will be much faster than serializing nodes, because the latter option is usually very fast in BaseX.
OK, I understand. You're right, it probably wouldn't be faster. In any case, serializing/deserializing transformation input (typically small pages) is never going to be a bottleneck in a web context, so it doesn't matter. Xslt compilation on the other hand does incur a noticeable cost if it is repeated for each request.
Regards, Tom
On 6/02/2017 14:08, Christian Grün wrote:
Tom,
Thanks for the excellent summary on what could be done, very appreciated!
- basic XSLT caching with the existing JAXP interface, as described in
the articles or similar; 2. specific saxon:transform() etc. functions that use the new Saxon interface (and do caching); 3. idem but implemented for the regular xslt:transform(), or maybe the function in XQFO 3.1 (thanks for the link, I was not aware of this)?
Variant 1 is surely something that I can easily include. I will check out your links and give you some update this week.
Talking about a tighter integration, I fully agree with Adam’s comments:
• Switching to to the Saxon’s API would be a reasonable choice. We still have users who work with standard Xalan XSLT, but we could definitely use Michael Kay’s s9api whenever Saxon is found in the classpath. I have added an issue to our GitHub tracker [1].
• Similar to eXist, the BaseX DOM models are pretty different from Saxon’s representation. In BaseX, it is possible to ceate a standard Java DOM representation for arbitrary XML nodes, but I doubt that working it will be much faster than serializing nodes, because the latter option is usually very fast in BaseX.
Cheers, Christian
[1] https://github.com/BaseXdb/basex/issues/1408
Thinking forward, absolutely wonderful would be some form of tight integration with Saxon that passes nodes from BaseX to Saxon directly, without serializing/parsing.
Incidentally, there is an interesting note on this topic on the eXist developer platform (scroll to the bottom): https://github.com/eXist-db/exist/issues/791
But any of option 1-3 (or similar) would do the trick and be great!
Best regards, Tom
On 5/02/2017 15:01, Christian Grün wrote:
Hi Tom,
You are right. xslt:transform() does nothing else than sending stylesheets to the registered XSLT processor (which is usually Xalan or Saxon).
The XQFO 3.1 spec [1] will provide an fn:transform function that provides a "cache" option. As the definition of this function is very Saxon-specific, I am not sure if we will completely support it in future. For now, if you know how caching is enabled in Saxon, feel free to provide me with some example code, and I will see if I can easily embed it in our current architecture.
Cheers, Christian
[1] https://www.w3.org/TR/xpath-functions-31/#func-transform
On Sun, Feb 5, 2017 at 1:52 AM, Tom De Herdt tom.deherdt@skynet.be wrote: > > > Hi all, > > I'm evaluating BaseX as an alternative (and very attractive) platform > for > an > XML/XSLT-based website that needs to be migrated from ASP.NET. > > The website relies heavily on XSLT. Each page is generated on-the-fly > with > Saxon.NET, using a complex set of stylesheets. To get reasonable > performance, stylesheets are compiled on first use and cached for > subsequent > requests. > > This is crucial, as XSLT compilation is typically orders of magnitude > slower > than execution; without caching, the server would spend most of the > time > compiling the same stylesheets over and over again. > > I was happy to find that BaseX can use Saxon, but as far as I can > see, > xslt:transform() does not cache compiled stylesheets. Can anyone > confirm > this? > > If not, are there any plans to support stylesheet caching in the > future? > > Or is there a way to reuse compiled stylesheets manually? > > Thanks, > Tom De Herdt > > >
basex-talk@mailman.uni-konstanz.de