Hi Gunther,
> I am busy right now, but will be able to present some code tonight.
Thanks! Take your time.
> Is there a different tree model than DOM, that you would prefer for BaseX?
I assume that the difference between DOM and String inputs will be
marginal. If the method will be called from XQuery, one the fastest
solutions is probably to write everything to a temporary string or
byte array and create an XQuery node representation (which is an
instance of DBNode in BaseX):
import org.basex.io.IO;
import org.basex.query.value.node.DBNode;
static DBNode parseXml() throws Exception {
String input = "<xml/>";
return new DBNode(IO.get(input));
}
Thinking about this, I noticed that my previous parse-xquery.xq
example will be executed faster (from 5ms to 2ms if executed
repeatedly) if fn:parse-xml is replaced with with
fn:parse-xml-fragment. This is why our internal XML parser instead of
Java’s default XML parser is used for the second function.
So this version is probably the best (it is more than 10 times faster
than version 1 for small XML documents):
import org.basex.build.xml.XMLParser;
import org.basex.core.MainOptions;
import org.basex.io.IO;
import org.basex.query.value.node.DBNode;
static DBNode parseXml() throws Exception {
String input = "<x/>";
XMLParser parser = new XMLParser(IO.get(input), MainOptions.get());
return new DBNode(parser);
}
But I’m wondering who’ll eventually care about the difference ;)
Christian
>
> By the way, the generated Saxon imports serve two purposes:
>
> - adapting to the extension function API (necessary when using Saxon-HE)
> - using Saxon's native tree builder.
>
> Best regards
> Gunther
> --
>
>
> Gesendet: Donnerstag, 31. März 2016 um 11:14 Uhr
> Von: "Christian Grün" <christian.gruen@gmail.com>
> An: "Gunther Rademacher" <grd@gmx.net>
> Cc: BaseX <basex-talk@mailman.uni-konstanz.de>
> Betreff: Re: Re: [basex-talk] BaseX optimizer performance on REx-generated parser
> Hi Gunther, hi all,
>
> here is a straightforward (yet I somewhat hacky) way to invoke the ReX
> Java parser code from XQuery:
>
> 1. download the XQuery grammar, e.g.
> http://bottlecaps.de/rex/CR-xquery-31-20151217.ebnf
>
> 2. generate a Java-coded parser from it, using these command line options
> -java -tree -main
>
> 3. compile the result
> javac CR_xquery_31_20151217.java
>
> 4. run the attached XQuery files with BaseX or Saxon EE, and with the
> compiled parser classes in the classpath, e.g.:
>
> java -cp BaseX.jar;. org.basex.BaseX parse-xquery.xq
> java -cp saxon9ee.jar;. net.sf.saxon.Query parse-xquery.xq
>
> (The semicolon must be replaced with a colon on Unix/Linux-based systems).
>
> In BaseX, for simple inputs, the compiled tree will be available in
> 5-10 ms. I assume it could be even faster when embedding some native
> BaseX code in the ReX Parser Generator; but I don’t know how much
> effort this will be?
>
> Hope this helps, feedback is welcome,
> Christian