Hi Christian,


thank you very much for looking into this and also for the query. I can confirm that by using your rewritten query the performance problem is gone!


Also thank you for taking the time to explain the technical reasons!


Best regards,

Michael



Mag. Michael Birkner
AK Wien - Bibliothek
1040, Prinz Eugen Straße 20-22
T: +43 1 501 65 12455
F: +43 1 501 65 142455
M: +43 664 88957669

michael.birkner@akwien.at
wien.arbeiterkammer.at

Besuchen Sie uns auch auf:
facebook | twitter | youtube
--------------------------------------------------
Die AK setzt sich seit 100 Jahren für Gerechtigkeit ein.
Damals. Heute. Für immer.


arbeiterkammer.at/100




Von: Christian Grün <christian.gruen@gmail.com>
Gesendet: Montag, 11. Mai 2020 13:02
An: BIRKNER Michael
Cc: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] Performance loss between version 9.2.4 and 9.3.2 when executing specific xQuery
 
Hi Michael,

I checked your use case in greater depth, and I found the change in
our code that caused the slowdown [1].

A) The nutshell answer : Just use the attached query!

B) The extensive technical answer:

• In previous versions of BaseX, most paths in FLWOR expressions were
»inlined« in the code to trigger further optimizations, such as index
rewritings.
• The enforced inlining led to cases in which the execution time was
worse than for unoptimized queries.
• As a user cannot prevent variables from being inlined, we have
switched to a more predictive pattern in our inlining heuristics:
Paths will only be moved around anymore if we can ensure that the
execution time will not suffer.

A little example:

  let $nodes := db:open('db')/to/this/only/once
  for $i in 1 to 1000
  return $nodes

If $nodes is inlined by the optimizer (i.e., if the variable reference
$nodes in the last line is replaced by the actual path), the path will
be evaluated 1000 times instead of once. The revised query optimizer
won’t inline such paths anymore.

Your particular query benefited from the offensive rewriting, though.
In the first step, "db:open('gnd-sachbegriff')/collection/record" was
inlined by the optimizer:

  let $recFromExistingData := db:open('gnd-sachbegriff')/
    collection/record[controlfield[@tag = '001'] = $id]

In the second step, the path was rewritten for index access:

  let $recFromExistingData := db:text('gnd-sachbegriff', $id)/
    parent::controlfield[@tag = '001']/parent::record

The index rewriting (which you can spot in the Info View by looking
for "apply text index") led to a much faster evaluation of your query
because it reduces the execution time from quadratic to linear.

If you adopt one of the code lines above, your query will be evaluated
faster again.

In the attached query, db:open is still assigned to variables. As
db:open will only be evaluated once and already at compile time, the
document nodes that will be bound to $sachbegriffe can always be
inlined.

Hope this helps,
Christian

[1] https://github.com/BaseXdb/basex/issues/1722
Beachten Sie, dass Sie uns ab sofort unter einer geänderten Rufnummer erreichen. Bitte speichern Sie gleich Ihren Kontakt zur AK Wien ein unter 501 65 1, gefolgt von der gewohnten Durchwahl.
Dieses Mail ist ausschließlich für die Verwendung durch die/den darin genannten AdressatInnen bestimmt und kann vertrauliche bzw rechtlich geschützte Informationen enthalten, deren Verwendung ohne Genehmigung durch den/ die AbsenderIn rechtswidrig sein kann.
Falls Sie dieses Mail irrtümlich erhalten haben, informieren Sie uns bitte und löschen Sie die Nachricht.
UID: ATU 16209706 I https://wien.arbeiterkammer.at/datenschutz