Hi,
I have two XQueries that do exactly the same thing, i.e. create a list of waters in the Facts database and display where these waters flow into, if any. But Query 1 executes 10 times faster than Query 2 (in the GUI). The only difference is in the selection of the query context. Query 1 assumes the Facts database is open and does not use an explicit doc() or collection() context. Query 2 uses an explicit collection("Facts") context
Query 1 executes in appr. 600 msecs, Query 2 in appr. 60 msecs (!)
How can there be such a huge difference? Is it because Query 2 operates twice on the $facts variable and does not need to evaluate that again?
Listings:
Query1
(: list waters and where they stream to (if any):) for $source in //(sea|river|lake) let $toId := $source/to/@water let $to := (//sea|//river|//lake)[@id=$toId][1] let $name := if (empty($to/local-name())) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
Query2
(: list waters and where they stream to (if any):) let $facts := collection("Facts")//(sea|river|lake ) for $source in $facts let $toId := $source/to/@water let $to := $facts[@id=$toId][1] let $name := if (empty($to/local-name())) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
As a side-question: I want to extend the query to make it recursive: river "Bahr el-Djebel" streams into river "White Nile" streams into river "Nile" streams into sea "Mediterranean Sea" I think I can find out how to do that, but how can I optimize the recursion process? Would a recursive function be efficient?
Thanks,
Paul
Hi Huub,
Thank you for your reply. I tried your suggestions, but it does not make any difference. I changed Query 1 to this:
(: list waters and where they stream to (if any):) for $source in //(sea|river|lake) let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
but there is no performance gain. The query still executes at least 10 times slower than Query 2.
Thanks for the empty($to) suggestion.
As for the recursive algorithm: in the meantime I wrote the query for that and it works like a charm!
Paul
Hi Paul,
Op 4 aug. 2014, om 09:27 heeft Paul Swennenhuis < paul@swennenhuis.nl mailto:paul@swennenhuis.nl > het volgende geschreven: Listings:
Query1
(: list waters and where they stream to (if any):) for $source in //(sea|river|lake) let $toId := $source/to/@water let $to := (//sea|//river|//lake)[@id=$toId][1]
You start to search for “sea” elements at the very top of the db, then, for “river” elements you start to search at the very top of the db, then, for “lake” elements you start to search at the very top of the db. And you do this for every $source you process. This is different from "//(sea|river|lake)" where you start at the top (once) and then match sea, river or lake elements. In the second query you find all sea, river and lake elements once and then use that sequence to search in, that would be (much) faster.
It might even be faster to just search all element and filter on @id (BaseX can then use the attribute index and just needs to use it once, probably), f.i.:
let $toWaters := //*[@id = $toId]
and $toWaters contains all waters (sea, river and lake elements) the $source streams to. Add a [1] if you just need the first one like you did. (If you know that all waters mentioned in $source/to exist in your db, wouldn't it be better to restrict $toId instead of $to, i.e. just use the first $source/to element?)
let $name := if (empty($to/local-name())) then “none” else $to/local-name()
I am not sure I understand but wouldn't empty($to) do the trick?
return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
As a side-question: I want to extend the query to make it recursive: river “Bahr el-Djebel” streams into river “White Nile” streams into river “Nile” streams into sea “Mediterranean Sea” I think I can find out how to do that, but how can I optimize the recursion process? Would a recursive function be efficient?
Yes, that would do the trick. Generally, tail-recursiveness is a good thing, but in this case it wouldn't matter much probably. Just watch out for those weird rivers that flow back into the lake they originate from ;-).
Regards,
Huib Verweij.
Dear Paul,
Is it a big collection ? Could the difference be in opening the collection ? Did you try to run the slow request for example in the GUI, with the collection already opened ?
Best regards, Fabrice
De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Paul Swennenhuis Envoyé : lundi 4 août 2014 11:22 À : H. Verweij; BaseX Objet : Re: [basex-talk] Same query, huge difference in performance
Hi Huub,
Thank you for your reply. I tried your suggestions, but it does not make any difference. I changed Query 1 to this:
(: list waters and where they stream to (if any):) for $source in //(sea|river|lake) let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
but there is no performance gain. The query still executes at least 10 times slower than Query 2.
Thanks for the empty($to) suggestion.
As for the recursive algorithm: in the meantime I wrote the query for that and it works like a charm!
Paul
Hi Paul, Op 4 aug. 2014, om 09:27 heeft Paul Swennenhuis < paul@swennenhuis.nlmailto:paul@swennenhuis.nl > het volgende geschreven: Listings:
Query1
(: list waters and where they stream to (if any):) for $source in //(sea|river|lake) let $toId := $source/to/@water let $to := (//sea|//river|//lake)[@id=$toId][1]
You start to search for “sea” elements at the very top of the db, then, for “river” elements you start to search at the very top of the db, then, for “lake” elements you start to search at the very top of the db. And you do this for every $source you process. This is different from "//(sea|river|lake)" where you start at the top (once) and then match sea, river or lake elements. In the second query you find all sea, river and lake elements once and then use that sequence to search in, that would be (much) faster.
It might even be faster to just search all element and filter on @id (BaseX can then use the attribute index and just needs to use it once, probably), f.i.:
let $toWaters := //*[@id = $toId]
and $toWaters contains all waters (sea, river and lake elements) the $source streams to. Add a [1] if you just need the first one like you did. (If you know that all waters mentioned in $source/to exist in your db, wouldn't it be better to restrict $toId instead of $to, i.e. just use the first $source/to element?)
let $name := if (empty($to/local-name())) then “none” else $to/local-name()
I am not sure I understand but wouldn't empty($to) do the trick?
return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
As a side-question: I want to extend the query to make it recursive: river “Bahr el-Djebel” streams into river “White Nile” streams into river “Nile” streams into sea “Mediterranean Sea” I think I can find out how to do that, but how can I optimize the recursion process? Would a recursive function be efficient?
Yes, that would do the trick. Generally, tail-recursiveness is a good thing, but in this case it wouldn't matter much probably. Just watch out for those weird rivers that flow back into the lake they originate from ;-).
Regards,
Huib Verweij.
Hi Fabrice,
Thanks for your contribution. The collection is the Facts database (factbook.xml) found in the distribution of BaseX. It's the same collection as used in Query 2. And yes, I did try to run the slow query in the GUI. In fact, that is the only place where I ran it, with the Facts database opened (it will yield an error if the database is not opened since it does not specify a context).
Paul
On 8/4/2014 11:35 AM, Fabrice Etanchaud wrote:
Dear Paul,
Is it a big collection ? Could the difference be in opening the collection ?
Did you try to run the slow request for example in the GUI, with the collection already opened ?
Best regards,
Fabrice
*De :*basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] *De la part de* Paul Swennenhuis *Envoyé :* lundi 4 août 2014 11:22 *À :* H. Verweij; BaseX *Objet :* Re: [basex-talk] Same query, huge difference in performance
Hi Huub,
Thank you for your reply. I tried your suggestions, but it does not make any difference. I changed Query 1 to this:
(: list waters and where they stream to (if any):) for $source in //(sea|river|lake) let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
but there is no performance gain. The query still executes at least 10 times slower than Query 2.
Thanks for the empty($to) suggestion.
As for the recursive algorithm: in the meantime I wrote the query for that and it works like a charm!
Paul
Hi Paul, Op 4 aug. 2014, om 09:27 heeft Paul Swennenhuis < paul@swennenhuis.nl <mailto:paul@swennenhuis.nl> > het volgende geschreven: Listings: Query1 (: list waters and where they stream to (if any):) for $source in //(sea|river|lake) let $toId := $source/to/@water let $to := (//sea|//river|//lake)[@id=$toId][1] You start to search for “sea” elements at the very top of the db, then, for “river” elements you start to search at the very top of the db, then, for “lake” elements you start to search at the very top of the db. And you do this for every $source you process. This is different from "//(sea|river|lake)" where you start at the top (once) and then match sea, river or lake elements. In the second query you find all sea, river and lake elements once and then use that sequence to search in, that would be (much) faster. It might even be faster to just search all element and filter on @id (BaseX can then use the attribute index and just needs to use it once, probably), f.i.: let $toWaters := //*[@id = $toId] and $toWaters contains all waters (sea, river and lake elements) the $source streams to. Add a [1] if you just need the first one like you did. (If you know that all waters mentioned in $source/to exist in your db, wouldn't it be better to restrict $toId instead of $to, i.e. just use the first $source/to element?) let $name := if (empty($to/local-name())) then “none” else $to/local-name() I am not sure I understand but wouldn't empty($to) do the trick? return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () } As a side-question: I want to extend the query to make it recursive: river “Bahr el-Djebel” streams into river “White Nile” streams into river “Nile” streams into sea “Mediterranean Sea” I think I can find out how to do that, but how can I optimize the recursion process? Would a recursive function be efficient? Yes, that would do the trick. Generally, tail-recursiveness is a good thing, but in this case it wouldn't matter much probably. Just watch out for those weird rivers that flow back into the lake they originate from ;-). Regards, Huib Verweij.
Hi Paul,
//(sea|river|lake)
Due to the (somewhat peculiar) semantics of XPath, this path is identical to...
/descendant-or-self::node()/ (child::sea | child::river | child::lake)
...and it creates a massive amount of intermediate results. You could try to rewrite it to...
/descendant::sea | /descendant::river | /descendant::lake
...or...
/descendant::*[local-name() = ('sea', 'river', 'lake')]
...and I will try to tweak our optimizer to automatically do this for you in future (it already works for single steps).
Christian
Hi Christian,
Sorry, also doesn't improve performance. I even tried to copy the optimized line for the selection, as found in the Query Info pane:
(: list waters and where they stream to (if any):) for $source in ((db:open-pre("facts",0)/descendant::*:sea union db:open-pre("facts",0)/descendant::*:river union db:open-pre("facts",0)/descendant::*:lake)) let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
No improvement. The problem seems to be in the line that assigns the $to variable If I reuse the main node selection there the query executes fast. Like such:
(: list waters and where they stream to (if any):) let $sources := /descendant::sea | /descendant::river | /descendant::lake for $source in $sources let $toId := $source/to/@water let $to := $sources[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
The original line, let $to := //*[@id=$toId][1], apparently is very expensive. I could do some testing with the profiling tools to see if I'm right.
Paul
Hi Paul,
//(sea|river|lake)
Due to the (somewhat peculiar) semantics of XPath, this path is identical to...
/descendant-or-self::node()/ (child::sea | child::river | child::lake)
...and it creates a massive amount of intermediate results. You could try to rewrite it to...
/descendant::sea | /descendant::river | /descendant::lake
...or...
/descendant::*[local-name() = ('sea', 'river', 'lake')]
...and I will try to tweak our optimizer to automatically do this for you in future (it already works for single steps).
Christian
Hi Paul,
thanks for your feedback. Are you working with 7.9? If it's not too much of a hassle for you, I would be interested to hear if you get better performance with the latest 8.0 snapshot?
Christian
[1] http://files.basex.org/releases/latest/
On Mon, Aug 4, 2014 at 11:57 AM, Paul Swennenhuis paul@swennenhuis.nl wrote:
Hi Christian,
Sorry, also doesn't improve performance. I even tried to copy the optimized line for the selection, as found in the Query Info pane:
(: list waters and where they stream to (if any):) for $source in ((db:open-pre("facts",0)/descendant::*:sea union db:open-pre("facts",0)/descendant::*:river union db:open-pre("facts",0)/descendant::*:lake))
let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
No improvement. The problem seems to be in the line that assigns the $to variable If I reuse the main node selection there the query executes fast. Like such:
(: list waters and where they stream to (if any):) let $sources := /descendant::sea | /descendant::river | /descendant::lake for $source in $sources let $toId := $source/to/@water let $to := $sources[@id=$toId][1]
let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
The original line, let $to := //*[@id=$toId][1], apparently is very expensive. I could do some testing with the profiling tools to see if I'm right.
Paul
Hi Paul,
//(sea|river|lake)
Due to the (somewhat peculiar) semantics of XPath, this path is identical to...
/descendant-or-self::node()/ (child::sea | child::river | child::lake)
...and it creates a massive amount of intermediate results. You could try to rewrite it to...
/descendant::sea | /descendant::river | /descendant::lake
...or...
/descendant::*[local-name() = ('sea', 'river', 'lake')]
...and I will try to tweak our optimizer to automatically do this for you in future (it already works for single steps).
Christian
Hi Christian,
I will try that. But first: I can confirm my suspicion. The offending line ( let $to := //*[@id=$toId][1] ) takes about 20 msecs per hit , and since there are 249 hits, that means 249 * 20 = 4980 msecs in total, almost 5 seconds!
On a side note: I discovered that BaseX's query optimization is working too good :-) I wanted to profile the execution time of that offending line, so I assigned the current time to variable $start before that line, and I assigned the current time to variable $end after that line:
let $start := (current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S') let $to := //*[@id=$toId][1] let $end := (current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')
And then I assigned the difference, $end-$start to an attribute in the result fragment. But it appeared that BaseX pre-evaluated the $start and $end variables and converted them into a constant, so I got the same $start and $end value in every result, and the difference was always 0.
The only way I saw to prevent that from happening was using xquery:eval, making it impossible for BaseX to pre-evaluate it:
let $start := xquery:eval("(current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')") let $to := //*[@id=$toId][1] let $end := xquery:eval("(current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')"
The complete profiling query:
(: list waters and where they stream to (if any):) for $source in /descendant::sea | /descendant::river | /descendant::lake let $toId := $source/to/@water let $start := xquery:eval("(current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')") let $to := //*[@id=$toId][1] let $end := xquery:eval("(current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')") let $name := if (empty($to)) then "none" else $to/local-name() return element water { attribute took {$end - $start}, element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
For me the lesson is: uses as much predefined selections as possible, particularly in "for" clauses.
Paul
Hi Paul,
thanks for your feedback. Are you working with 7.9? If it's not too much of a hassle for you, I would be interested to hear if you get better performance with the latest 8.0 snapshot?
Christian
[1] http://files.basex.org/releases/latest/
On Mon, Aug 4, 2014 at 11:57 AM, Paul Swennenhuis paul@swennenhuis.nl wrote:
Hi Christian,
Sorry, also doesn't improve performance. I even tried to copy the optimized line for the selection, as found in the Query Info pane:
(: list waters and where they stream to (if any):) for $source in ((db:open-pre("facts",0)/descendant::*:sea union db:open-pre("facts",0)/descendant::*:river union db:open-pre("facts",0)/descendant::*:lake))
let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
No improvement. The problem seems to be in the line that assigns the $to variable If I reuse the main node selection there the query executes fast. Like such:
(: list waters and where they stream to (if any):) let $sources := /descendant::sea | /descendant::river | /descendant::lake for $source in $sources let $toId := $source/to/@water let $to := $sources[@id=$toId][1]
let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
The original line, let $to := //*[@id=$toId][1], apparently is very expensive. I could do some testing with the profiling tools to see if I'm right.
Paul
Hi Paul,
//(sea|river|lake)
Due to the (somewhat peculiar) semantics of XPath, this path is identical to...
/descendant-or-self::node()/ (child::sea | child::river | child::lake)
...and it creates a massive amount of intermediate results. You could try to rewrite it to...
/descendant::sea | /descendant::river | /descendant::lake
...or...
/descendant::*[local-name() = ('sea', 'river', 'lake')]
...and I will try to tweak our optimizer to automatically do this for you in future (it already works for single steps).
Christian
Hi Paul,
thanks for trying 8.0. I have just uploaded yet another snapshot that optimizes descendant-or-self axes and union & list expressions; e.g.:
//(sea, river) -> (/descendant::sea | /descendant::river )
There are various other query optimizations that will be available in 8.0. But as you already observed, this is probably not the bottleneck in your query.
The offending line ( let $to := //*[@id=$toId][1] ) takes about 20 msecs per hit
It seems as if the index structures are not utilized here (you can open the InfoView in the GUI in order to see what's going on). You will probably get much better performance by using parentheses around the path expression:
(//*[@id=$toId])[1]
Please note that the two expressions are not equivalent: The second one will only give you 1 result whereas the first one may give you more than one result, because it's equivalent to:
/descendant-or-self::node()/child::*[@id=$toId][1]
The reason is that the two predicates belongs to the child step and not the full path expression.
let $start := (current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')
Due to the functional nature of XQuery, all calls of current-dateTime() will give you the same result during the execution of a query. But there is (at least) one way out: You can try prof:current-ns() instead [1].
By the way, here is one more variant of your query, which explicitly accesses the index structures (however, this version of BaseX-specific and not that nice to read anymore):
for $source in collection("Facts")/ (descendant::sea | descendant::river | descendant::lake) return element water { element {$source/local-name()} {data($source/@name)}, for $to in (db:text('Facts', $source/to/@water)/ (parent::sea | parent::river | parent::lake))[1] return element streamsTo { attribute {$to/local-name()} {data($to/@name)} } }
Hope this helps; feel free to ask for more details, Christian
Hi Christian,
Thanks again for your reply. I'm learning a lot by discussing these kind of problems.
I tested your variant. It does indeed perform a lot better. My Query 2 still beats it though, so I'll stick to that one for now.
Paul
Hi Paul,
thanks for trying 8.0. I have just uploaded yet another snapshot that optimizes descendant-or-self axes and union & list expressions; e.g.:
//(sea, river)
-> (/descendant::sea | /descendant::river )
There are various other query optimizations that will be available in 8.0. But as you already observed, this is probably not the bottleneck in your query.
The offending line ( let $to := //*[@id=$toId][1] ) takes about 20 msecs per hit
It seems as if the index structures are not utilized here (you can open the InfoView in the GUI in order to see what's going on). You will probably get much better performance by using parentheses around the path expression:
(//*[@id=$toId])[1]
Please note that the two expressions are not equivalent: The second one will only give you 1 result whereas the first one may give you more than one result, because it's equivalent to:
/descendant-or-self::node()/child::*[@id=$toId][1]
The reason is that the two predicates belongs to the child step and not the full path expression.
let $start := (current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')
Due to the functional nature of XQuery, all calls of current-dateTime() will give you the same result during the execution of a query. But there is (at least) one way out: You can try prof:current-ns() instead [1].
By the way, here is one more variant of your query, which explicitly accesses the index structures (however, this version of BaseX-specific and not that nice to read anymore):
for $source in collection("Facts")/ (descendant::sea | descendant::river | descendant::lake) return element water { element {$source/local-name()} {data($source/@name)}, for $to in (db:text('Facts', $source/to/@water)/ (parent::sea | parent::river | parent::lake))[1] return element streamsTo { attribute {$to/local-name()} {data($to/@name)} } }
Hope this helps; feel free to ask for more details, Christian
Hi Christian,
I saw that I still had my BaseX session open and wanted to close all open editor windows, then decided to run the slow query again, modified with your "parentheses around the path expression" solution. And guess what, it now runs as fast as Query 2! With parentheses: 80 msecs Without parentheses: 5300 msecs I seriously was under the assumption that I had tried it yesterday but apparently I had not. Anyway, it works. Thanks.
Paul
Hi Paul,
thanks for trying 8.0. I have just uploaded yet another snapshot that optimizes descendant-or-self axes and union & list expressions; e.g.:
//(sea, river)
-> (/descendant::sea | /descendant::river )
There are various other query optimizations that will be available in 8.0. But as you already observed, this is probably not the bottleneck in your query.
The offending line ( let $to := //*[@id=$toId][1] ) takes about 20 msecs per hit
It seems as if the index structures are not utilized here (you can open the InfoView in the GUI in order to see what's going on). You will probably get much better performance by using parentheses around the path expression:
(//*[@id=$toId])[1]
Please note that the two expressions are not equivalent: The second one will only give you 1 result whereas the first one may give you more than one result, because it's equivalent to:
/descendant-or-self::node()/child::*[@id=$toId][1]
The reason is that the two predicates belongs to the child step and not the full path expression.
let $start := (current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')
Due to the functional nature of XQuery, all calls of current-dateTime() will give you the same result during the execution of a query. But there is (at least) one way out: You can try prof:current-ns() instead [1].
By the way, here is one more variant of your query, which explicitly accesses the index structures (however, this version of BaseX-specific and not that nice to read anymore):
for $source in collection("Facts")/ (descendant::sea | descendant::river | descendant::lake) return element water { element {$source/local-name()} {data($source/@name)}, for $to in (db:text('Facts', $source/to/@water)/ (parent::sea | parent::river | parent::lake))[1] return element streamsTo { attribute {$to/local-name()} {data($to/@name)} } }
Hope this helps; feel free to ask for more details, Christian
Hi Christian,
Just tried the query in BaseX 8. Sorry, no improvement in performance.
Paul
Hi Paul,
thanks for your feedback. Are you working with 7.9? If it's not too much of a hassle for you, I would be interested to hear if you get better performance with the latest 8.0 snapshot?
Christian
[1] http://files.basex.org/releases/latest/
On Mon, Aug 4, 2014 at 11:57 AM, Paul Swennenhuis paul@swennenhuis.nl wrote:
Hi Christian,
Sorry, also doesn't improve performance. I even tried to copy the optimized line for the selection, as found in the Query Info pane:
(: list waters and where they stream to (if any):) for $source in ((db:open-pre("facts",0)/descendant::*:sea union db:open-pre("facts",0)/descendant::*:river union db:open-pre("facts",0)/descendant::*:lake))
let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
No improvement. The problem seems to be in the line that assigns the $to variable If I reuse the main node selection there the query executes fast. Like such:
(: list waters and where they stream to (if any):) let $sources := /descendant::sea | /descendant::river | /descendant::lake for $source in $sources let $toId := $source/to/@water let $to := $sources[@id=$toId][1]
let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }
The original line, let $to := //*[@id=$toId][1], apparently is very expensive. I could do some testing with the profiling tools to see if I'm right.
Paul
Hi Paul,
//(sea|river|lake)
Due to the (somewhat peculiar) semantics of XPath, this path is identical to...
/descendant-or-self::node()/ (child::sea | child::river | child::lake)
...and it creates a massive amount of intermediate results. You could try to rewrite it to...
/descendant::sea | /descendant::river | /descendant::lake
...or...
/descendant::*[local-name() = ('sea', 'river', 'lake')]
...and I will try to tweak our optimizer to automatically do this for you in future (it already works for single steps).
Christian
basex-talk@mailman.uni-konstanz.de