Re: [basex-talk] BaseX optimizer performance on REx-generated parser

29 Mar 2016


      Hi Andy,
that's correct. p:transition ought to be tail-call optimized to be 
prepared for tokens of arbitrary size.
Best regards
Gunther
On 29.03.2016 19:17, Andy Bunce wrote:
...
Hi Gunther,
Do I under understand correctly from:
https://twitter.com/__Gunther__/status/709744679361912832
Given the way the EBNF for XQuery comments is defined - If (REx)
p:transition is not optimized via tail-recursion then long comments will
use excessive stack?
/Andy
On 29 March 2016 at 18:05, Christian Grün <christian.gruen@gmail.com
mailto:christian.gruen@gmail.com> wrote:
So we may probably need to find out if the query can be optimized at
all…


On Tue, Mar 29, 2016 at 7:01 PM, Andy Bunce <bunce.andy@gmail.com
<mailto:bunce.andy@gmail.com>> wrote:
 > Looking again at Florent's message
 > http://markmail.org/message/gxi26da4crk2v5ge...
 > After testing it seems is the length of the XQuery comments that
trigger the
 > stack overflow in BaseX as well.
 > /Andy
 >
 >
 > On 29 March 2016 at 16:54, Andy Bunce <bunce.andy@gmail.com
<mailto:bunce.andy@gmail.com>> wrote:
 >>
 >> >Did you hear sth. about the performance of MarkLogic
 >> My understanding is that on ML the presence typing blocks some
 >> tail-recursion optimizations leading to stack overflow for large
texts.
 >>
 >> Stack overflow is also an issue for my use of REx with BaseX,  I
would
 >> like to be able to parse sources like the  FunctX library
without having to
 >> set a large Java stack size. [1] (My report that the issue was
fixed was
 >> incorrect). I would like to investigate if BaseX is using
tail-recursion
 >> optimizations here and/or where/why this overflow happens - but
it all looks
 >> very complicated :-).
 >>
 >> I suspect the users of XQuery REx are capable of hand tweaking
the REx o/p
 >> either to add or remove type declarations once the issues are well
 >> documented.
 >>
 >> /Andy
 >>
 >>
 >> [1]
 >>
http://www.mail-archive.com/basex-talk%40mailman.uni-konstanz.de/msg07212.html
 >>
 >> On 29 March 2016 at 16:24, Christian Grün
<christian.gruen@gmail.com <mailto:christian.gruen@gmail.com>>
 >> wrote:
 >>>
 >>> Hi Gunther,
 >>>
 >>> Thanks for your mail. After some research, I don’t see a quick
way to
 >>> statically infer the type in the function you mentioned, mostly
 >>> because it’s called recursively (while statically inferring the
type
 >>> of the function, the return type of map2 function is requested,
and it
 >>> will be the type of the function signature, because the
inferred type
 >>> is not available yet…).
 >>>
 >>> Personally, I would love to see the return types readded, not only
 >>> because it speeds up BaseX, but also because I think it’s a general
 >>> advantage of XQuery to have typed functions, and . Do you think
there
 >>> is any chance to revert the change? Did you hear sth. about the
 >>> performance of MarkLogic – is the version without argument types
 >>> comparable to BaseX in terms of execution time, or much faster?
 >>>
 >>> All the best,
 >>> Christian
 >>>
 >>>
 >>>
 >>> > in a recent update of REx parser generator (v5.37), some type
 >>> > specifications
 >>> > were removed from generated XQuery and XSLT functions. This
affects
 >>> > tail-recursive functions, and it was done because it turned
out that
 >>> > MarkLogic fails to optimize tail calls, presumably due to an
explicit
 >>> > type
 >>> > check caused by the type specification (see
 >>> > http://markmail.org/message/gxi26da4crk2v5ge ).
 >>> >
 >>> > Now it was reported that this change causes performance
problems with
 >>> > BaseX
 >>> > (see https://twitter.com/apb1704/status/714219874441146368 ).
 >>> >
 >>> > I reproduced the problem as follows:
 >>> >
 >>> >   - download the XQuery 3.1 grammar from
 >>> > http://bottlecaps.de/rex/CR-xquery-31-20151217.ebnf
 >>> >
 >>> >   - generate a parser from it on http://bottlecaps.de/rex/ using
 >>> > command
 >>> > line options:
 >>> >
 >>> >        -xquery -tree -main
 >>> >
 >>> >   - run the generated parser, using this command line
 >>> >
 >>> >        basex -binput={42} CR-xquery-31-20151217.xquery
 >>> >
 >>> > This works OK, but it takes about 80 seconds. Some analysis
showed that
 >>> > the
 >>> > time can be influenced by putting back the type specification to
 >>> > function
 >>> > p:map2, i.e. declaring 'as xs:integer' as the result type:
 >>> >
 >>> >        declare function p:map2($c as xs:integer, $lo as
xs:integer, $hi
 >>> > as
 >>> > xs:integer) as xs:integer
 >>> >
 >>> > This variant completes in less than 3 seconds. But even when
declaring
 >>> > a
 >>> > return type of
 >>> >
 >>> >           as xs:integer?
 >>> >
 >>> > which might be the type that can be inferred statically, it
completes
 >>> > fast.
 >>> >
 >>> > Is this possibly a problem with the optimizer?
 >>> >
 >>> > Which variant of generated code would be preferable for BaseX
- typed
 >>> > or
 >>> > untyped?
 >>> >
 >>> > Thanks,
 >>> > Gunther
 >>
 >>
 >

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] BaseX optimizer performance on REx-generated parser