Hi Fabrice,

Thanks for your contribution.
The collection is the Facts database (factbook.xml) found in the distribution of BaseX.
It's the same collection as used in Query 2.
And yes, I did try to run the slow query in the GUI. In fact, that is the only place where I ran it, with the Facts database opened (it will yield an error if the database is not opened since it does not specify a context).

Paul

On 8/4/2014 11:35 AM, Fabrice Etanchaud wrote:

Dear Paul,

 

Is it a big collection ? Could the difference be in opening the collection ?

Did you try to run the slow request for example in the GUI, with the collection already opened ?

 

Best regards,

Fabrice

 

De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Paul Swennenhuis
Envoyé : lundi 4 août 2014 11:22
À : H. Verweij; BaseX
Objet : Re: [basex-talk] Same query, huge difference in performance

 

Hi Huub,

Thank you for your reply.
I tried your suggestions, but it does not make any difference.
I changed Query 1 to this:

(: list waters and where they stream to (if any):)
for $source in  //(sea|river|lake)
let $toId := $source/to/@water
let $to := //*[@id=$toId][1]
let $name := if (empty($to)) then "none" else $to/local-name()
return
element water {
  element {$source/local-name()} {data($source/@name)},
  if (not($name="none"))then
  element streamsTo {
    attribute {$name} {data($to/@name)}
  }
  else ()
}

but there is no performance gain. The query still executes at least 10 times slower than Query 2.

Thanks for the empty($to) suggestion.

As for the recursive algorithm: in the meantime I wrote the query for that and it works like a charm!

Paul

Hi Paul,

Op 4 aug. 2014, om 09:27 heeft Paul Swennenhuis < paul@swennenhuis.nl > het volgende geschreven:
Listings:

Query1

(: list waters and where they stream to (if any):)
for $source in //(sea|river|lake)
let $toId := $source/to/@water
let $to := (//sea|//river|//lake)[@id=$toId][1]


You start to search for “sea” elements at the very top of the db, then, for “river” elements you start to search at the very top of the db, then, for “lake” elements you start to search at the very top of the db. And you do this for every $source you process. This is different from "//(sea|river|lake)" where you start at the top (once) and then match sea, river or lake elements. In the second query you find all sea, river and lake elements once and then use that sequence to search in, that would be (much) faster.

It might even be faster to just search all element and filter on @id (BaseX can then use the attribute index and just needs to use it once, probably), f.i.:

 

let $toWaters := //*[@id = $toId]

 

and $toWaters contains all waters (sea, river and lake elements) the $source streams to. Add a [1] if you just need the first one like you did. (If you know that all waters mentioned in $source/to exist in your db, wouldn't it be better to restrict $toId instead of $to, i.e. just use the first $source/to element?)

 

let $name := if (empty($to/local-name())) then “none” else $to/local-name()

 

I am not sure I understand but wouldn't empty($to) do the trick?

 

return
element water {
element {$source/local-name()} {data($source/@name)},
if (not($name="none"))then
element streamsTo {
attribute {$name} {data($to/@name)}
}
else ()
}

As a side-question: I want to extend the query to make it recursive: river “Bahr el-Djebel” streams into river “White Nile” streams into river “Nile” streams into sea “Mediterranean Sea”
I think I can find out how to do that, but how can I optimize the recursion process? Would a recursive function be efficient?


Yes, that would do the trick. Generally, tail-recursiveness is a good thing, but in this case it wouldn't matter much probably. Just watch out for those weird rivers that flow back into the lake they originate from ;-).

Regards,

 

Huib Verweij.