Hi Nicolas,
finally some feedback: as you already figured out (thanks for the hint), BaseX 7.2. and 7.2.1 apply different rewritings before evaluating your query. I have found the "optimization" that changes the behavior [1]. It was introduced to pre-evaluate a number of other queries starting with a root node [2]. As this rewriting is important to speed up a bunch of other queries that have been too slow in the past, it will probably stay as is. Instead , we might implement some other rewritings that could again speed up queries like yours.
The core problem is that it's difficult to decide a) when/if lazy evaluation of an expression will be faster than a pre-evaluation, and b) which subexpressions will always yield the same result and can thus be cached. As an example, the following query…
let $doc := //millionsOfNodes return $doc[1]
…will be evaluated much faster if it's rewritten to…
(//millionsOfNodes)[1]
…because querying can be stopped after the first node has been returned.
In a nutshell (I hope I didn’t clutter you with too many details): your existing query will further on be rewritten differently than before, BUT… You can move the expression of the first "let" clause to a global variable. This way, you can enforce that the expression will always be pre-evaluated, and not moved into the loop: _______________________
declare namespace map = "http://www.w3.org/2005/xpath-functions/map"; declare variable $res := map:new( ... );
for $dml in /dml/dmlContent/dmlEntry/dmRef/dmRefIdent/dmCode let $ident := string(concat($dml/@modelIdentCode , $dml/@systemDiffCode , ...
Hope this helps; feel free to ask for more, Christian
[1] https://github.com/BaseXdb/basex/commit/68a4490ff8e6d7f6a75d1b182a9091b76bd1... [2] https://github.com/BaseXdb/basex/issues/474 ___________________________
Here is the query, hope it will help :
declare namespace map="http://www.w3.org/2005/xpath-functions/map";
<result> {
let $res := map:new(for $dmCode in /dmodule/identAndStatusSection/dmAddress/dmIdent/dmCode
return map:entry(string(concat($dmCode/@modelIdentCode , $dmCode/@systemDiffCode , $dmCode/@systemCode , $dmCode/@subSystemCode , $dmCode/@subSubSystemCode , $dmCode/@assyCode , $dmCode/@disassyCode , $dmCode/@disassyCodeVariant , $dmCode/@infoCode , $dmCode/@infoCodeVariant , $dmCode/@itemLocationCode)) , true()))
for $dml in /dml/dmlContent/dmlEntry/dmRef/dmRefIdent/dmCode let $ident := string(concat($dml/@modelIdentCode , $dml/@systemDiffCode , $dml/@systemCode , $dml/@subSystemCode , $dml/@subSubSystemCode , $dml/@assyCode , $dml/@disassyCode , $dml/@disassyCodeVariant , $dml/@infoCode , $dml/@infoCodeVariant , $dml/@itemLocationCode)) return if (not(map:contains($res , $ident))) then <ko>{$ident}</ko> else()
}
</result>
On Mon, Jun 25, 2012 at 11:39 AM, Christian Grün christian.gruen@gmail.com wrote:
Hi Nicolas,
thanks for your analysis. Due to the complexity of XQuery, and the wide variety of possible execution plans, it frequently happens that some queries get slower than others, and vice versa. 2 seconds vs. 10 minutes is striking, though, so feel free to send us a little query that demonstrates the behavior.
Christian __________________________
My first statement is wrong, the performance drop is not from 7.2 to 7.3 but from 7.2 to 7.2.1.
BaseX v7.2 executes the query in sequence. First the flwor which extracts data and stores them in a map. Second the flwor which test if values are contains in the map.
BaseX v7.2.1 optimises the query and the two sequentials flwor are merge in a main flwor with an embed flwor. The maps seems to be constructed again and again for each value of the second flwor.
Hope it will help,
Regards,
Nicolas
On Mon, Jun 25, 2012 at 10:21 AM, Nicolas Labrot nithril@gmail.com wrote:
Hello,
I have upgraded BaseX from 7.2 to 7.3 but I have a severe performance drop on my query.
My query is in 2 parts :
The first part extracts values from an xpath and store them in a map (the map contains 40000 entries) The second part extracts values from an xpath and test if they are contains in the map (xpath return around 30000 entries).
On 7.2 a typical query run on 2s and on 7.3 I have no result after 10 minutes
Is there modification on BaseX 7.3 which can explain this performance drop ?
Thanks for your help,
Regards,
Nicolas
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk