Re: [basex-talk] BaseX optimizer performance on REx-generated parser

4 Apr 2016

      Hi Christian,
thank you for the tree builder proposal, it works fine indeed.
I have slightly modified the extension function such that it behaves the 
same as generated XQuery code, so can be used to replace it without further 
adaptations of the code that calls it.
Also I have used Str rather than String, in order to create a unique signature
identifying a BaseX extension function.
Finally, the call of the parser's parse_x method was isolated in order to 
prepare for multiple extension functions in a single class. This occurs when
there are multiple start symbols in a grammar.
The modified code is attached to this mail. It is stripped down to what
would be added to REx-generated code for '-basex'.
Best regards
Gunther
Gesendet: Freitag, 01. April 2016 um 17:57 Uhr
Von: "Christian Grün" christian.gruen@gmail.com
An: "Gunther Rademacher" grd@gmx.net
Cc: BaseX basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] BaseX optimizer performance on REx-generated parser
Hi Gunther,
Thanks again! Thanks to your examples, which create 38 MB of
serialized XML, I now see why it is in fact beneficial to use a tree
builder ;)
I finally looked at your Saxon code a bit closer, and I rewrote it a
bit to work with BaseX:
* I added a parse(String query) function, which basically does what
ExtensionFunctionCall.call does
* I renamed SaxonTreeBuilder to BaseXTreeBuilder, which now calls the
appropriate BaseX builder functions
* The TopDownTreeBuilder stays unchanged
I have attached the resulting code; it seems to be much faster indeed.
Does it make any sense to you? Do you think it would make sense to
provide both a Saxon and BaseX option on your parser page?
Christian
On Fri, Apr 1, 2016 at 12:32 AM, Gunther Rademacher grd@gmx.net wrote:
...
Hi Christian,
please find my code attached. I have tested it along with an XQuery 3.1
parser, that was generated using command line options:
-tree -main -java -saxon
It contains the DOM tree builder, as well as your approach using
XmlSerializer followed by XML parsing, both for BaseX and for Saxon.
In my tests I have parsed the XQuery code for the same grammar, roughly
1 MB, and counted nodes of the parse tree.
These are the commands that I have used:
java org.basex.BaseX -q "declare namespace p='java:XQueryParser'; p:parseXQueryToDOM(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
java org.basex.BaseX -q "declare namespace p='java:XQueryParser'; p:parseXQueryToDBNode(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
java net.sf.saxon.Query -qs:"declare namespace p='java:XQueryParser'; p:parseXQueryToDOM(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
java net.sf.saxon.Query -init:XQueryParser$SaxonInitializer -qs:"declare namespace p='XQueryParser'; p:parseXQueryToNodeInfo(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
java net.sf.saxon.Query -init:CR_xquery_31_20151217$SaxonInitializer -qs:"declare namespace p='CR_xquery_31_20151217'; p:parse-XQuery(unparsed-text('file:///C:/temp/CR-xquery-31-20151217.xquery'))/count(descendant-or-self::node())"
And here are the results (best runtime in seconds out of several executions):
| BaseX | SaxonEE
---------------+-----------+------------
DOM builder | 4.48 | 2.98
parseXml | 3.57 | 3.24
native builder | - | 2.36
As you expected, using DOM seems not to be advantageous for BaseX. However
the Saxon results suggest that a native tree builder API can do better than
parsing XML.
Best regards
Gunther
Gesendet: Donnerstag, 31. März 2016 um 15:01 Uhr
Von: "Christian Grün" christian.gruen@gmail.com
An: "Gunther Rademacher" grd@gmx.net
Cc: BaseX basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] BaseX optimizer performance on REx-generated parser
Hi Gunther,
...
I am busy right now, but will be able to present some code tonight.
Thanks! Take your time.
...
Is there a different tree model than DOM, that you would prefer for BaseX?
I assume that the difference between DOM and String inputs will be
marginal. If the method will be called from XQuery, one the fastest
solutions is probably to write everything to a temporary string or
byte array and create an XQuery node representation (which is an
instance of DBNode in BaseX):
import org.basex.io.IO;
import org.basex.query.value.node.DBNode;
static DBNode parseXml() throws Exception {
String input = "<xml/>";
return new DBNode(IO.get(input));
}
Thinking about this, I noticed that my previous parse-xquery.xq
example will be executed faster (from 5ms to 2ms if executed
repeatedly) if fn:parse-xml is replaced with with
fn:parse-xml-fragment. This is why our internal XML parser instead of
Java’s default XML parser is used for the second function.
So this version is probably the best (it is more than 10 times faster
than version 1 for small XML documents):
import org.basex.build.xml.XMLParser;
import org.basex.core.MainOptions;
import org.basex.io.IO;
import org.basex.query.value.node.DBNode;
static DBNode parseXml() throws Exception {
String input = "<x/>";
XMLParser parser = new XMLParser(IO.get(input), MainOptions.get());
return new DBNode(parser);
}
But I’m wondering who’ll eventually care about the difference ;)
Christian
...
By the way, the generated Saxon imports serve two purposes:

adapting to the extension function API (necessary when using Saxon-HE)
using Saxon's native tree builder.

Best regards
Gunther
--
Gesendet: Donnerstag, 31. März 2016 um 11:14 Uhr
Von: "Christian Grün" christian.gruen@gmail.com
An: "Gunther Rademacher" grd@gmx.net
Cc: BaseX basex-talk@mailman.uni-konstanz.de
Betreff: Re: Re: [basex-talk] BaseX optimizer performance on REx-generated parser
Hi Gunther, hi all,
here is a straightforward (yet I somewhat hacky) way to invoke the ReX
Java parser code from XQuery:

download the XQuery grammar, e.g.

http://bottlecaps.de/rex/CR-xquery-31-20151217.ebnf

generate a Java-coded parser from it, using these command line options

-java -tree -main

compile the result

javac CR_xquery_31_20151217.java

run the attached XQuery files with BaseX or Saxon EE, and with the

compiled parser classes in the classpath, e.g.:
java -cp BaseX.jar;. org.basex.BaseX parse-xquery.xq
java -cp saxon9ee.jar;. net.sf.saxon.Query parse-xquery.xq
(The semicolon must be replaced with a colon on Unix/Linux-based systems).
In BaseX, for simple inputs, the compiled tree will be available in
5-10 ms. I assume it could be even faster when embedding some native
BaseX code in the ReX Parser Generator; but I don’t know how much
effort this will be?
Hope this helps, feedback is welcome,
Christian

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] BaseX optimizer performance on REx-generated parser