Hi Christian,
I will try that. But first: I can confirm my suspicion. The offending line ( let $to := //*[@id=$toId][1] ) takes about 20 msecs per hit , and since there are 249 hits, that means 249 * 20 = 4980 msecs in total, almost 5 seconds!
On a side note: I discovered that BaseX's query optimization is working too good :-) I wanted to profile the execution time of that offending line, so I assigned the current time to variable $start before that line, and I assigned the current time to variable $end after that line:
let $start := (current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S') let $to := //*[@id=$toId][1] let $end := (current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')
And then I assigned the difference, $end-$start to an attribute in the result fragment. But it appeared that BaseX pre-evaluated the $start and $end variables and converted them into a constant, so I got the same $start and $end value in every result, and the difference was always 0.
The only way I saw to prevent that from happening was using xquery:eval, making it impossible for BaseX to pre-evaluate it:
let $start := xquery:eval("(current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')") let $to := //*[@id=$toId][1] let $end := xquery:eval("(current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')"
The complete profiling query:
(: list waters and where they stream to (if any):) for $source in /descendant::sea | /descendant::river | /descendant::lake let $toId := $source/to/@water let $start := xquery:eval("(current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')") let $to := //*[@id=$toId][1] let $end := xquery:eval("(current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')") let $name := if (empty($to)) then "none" else $to/local-name() return element water { attribute took {$end - $start}, element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
For me the lesson is: uses as much predefined selections as possible, particularly in "for" clauses.
Paul
Hi Paul,
thanks for your feedback. Are you working with 7.9? If it's not too much of a hassle for you, I would be interested to hear if you get better performance with the latest 8.0 snapshot?
Christian
[1] http://files.basex.org/releases/latest/
On Mon, Aug 4, 2014 at 11:57 AM, Paul Swennenhuis paul@swennenhuis.nl wrote:
Hi Christian,
Sorry, also doesn't improve performance. I even tried to copy the optimized line for the selection, as found in the Query Info pane:
(: list waters and where they stream to (if any):) for $source in ((db:open-pre("facts",0)/descendant::*:sea union db:open-pre("facts",0)/descendant::*:river union db:open-pre("facts",0)/descendant::*:lake))
let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
No improvement. The problem seems to be in the line that assigns the $to variable If I reuse the main node selection there the query executes fast. Like such:
(: list waters and where they stream to (if any):) let $sources := /descendant::sea | /descendant::river | /descendant::lake for $source in $sources let $toId := $source/to/@water let $to := $sources[@id=$toId][1]
let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
The original line, let $to := //*[@id=$toId][1], apparently is very expensive. I could do some testing with the profiling tools to see if I'm right.
Paul
Hi Paul,
//(sea|river|lake)
Due to the (somewhat peculiar) semantics of XPath, this path is identical to...
/descendant-or-self::node()/ (child::sea | child::river | child::lake)
...and it creates a massive amount of intermediate results. You could try to rewrite it to...
/descendant::sea | /descendant::river | /descendant::lake
...or...
/descendant::*[local-name() = ('sea', 'river', 'lake')]
...and I will try to tweak our optimizer to automatically do this for you in future (it already works for single steps).
Christian