Did you hear sth. about the performance of MarkLogic
My understanding is that on ML the presence typing blocks some tail-recursion optimizations leading to stack overflow for large texts.
Stack overflow is also an issue for my use of REx with BaseX, I would like to be able to parse sources like the FunctX library without having to set a large Java stack size. [1] (My report that the issue was fixed was incorrect). I would like to investigate if BaseX is using tail-recursion optimizations here and/or where/why this overflow happens - but it all looks very complicated :-).
I suspect the users of XQuery REx are capable of hand tweaking the REx o/p either to add or remove type declarations once the issues are well documented.
/Andy
[1] http://www.mail-archive.com/basex-talk%40mailman.uni-konstanz.de/msg07212.ht...
On 29 March 2016 at 16:24, Christian Grün christian.gruen@gmail.com wrote:
Hi Gunther,
Thanks for your mail. After some research, I don’t see a quick way to statically infer the type in the function you mentioned, mostly because it’s called recursively (while statically inferring the type of the function, the return type of map2 function is requested, and it will be the type of the function signature, because the inferred type is not available yet…).
Personally, I would love to see the return types readded, not only because it speeds up BaseX, but also because I think it’s a general advantage of XQuery to have typed functions, and . Do you think there is any chance to revert the change? Did you hear sth. about the performance of MarkLogic – is the version without argument types comparable to BaseX in terms of execution time, or much faster?
All the best, Christian
in a recent update of REx parser generator (v5.37), some type
specifications
were removed from generated XQuery and XSLT functions. This affects tail-recursive functions, and it was done because it turned out that MarkLogic fails to optimize tail calls, presumably due to an explicit
type
check caused by the type specification (see http://markmail.org/message/gxi26da4crk2v5ge ).
Now it was reported that this change causes performance problems with
BaseX
(see https://twitter.com/apb1704/status/714219874441146368 ).
I reproduced the problem as follows:
- download the XQuery 3.1 grammar from
http://bottlecaps.de/rex/CR-xquery-31-20151217.ebnf
- generate a parser from it on http://bottlecaps.de/rex/ using command
line options:
-xquery -tree -main
run the generated parser, using this command line
basex -binput={42} CR-xquery-31-20151217.xquery
This works OK, but it takes about 80 seconds. Some analysis showed that
the
time can be influenced by putting back the type specification to function p:map2, i.e. declaring 'as xs:integer' as the result type:
declare function p:map2($c as xs:integer, $lo as xs:integer, $hi
as
xs:integer) as xs:integer
This variant completes in less than 3 seconds. But even when declaring a return type of
as xs:integer?
which might be the type that can be inferred statically, it completes
fast.
Is this possibly a problem with the optimizer?
Which variant of generated code would be preferable for BaseX - typed or untyped?
Thanks, Gunther