Same query, huge difference in performance

List overview All Threads
Download

newer

older

Unit module sometimes drives me...

UPDINDEX and ever growing index...

Paul Swennenhuis

4 Aug 2014 4 Aug '14

3:27 a.m.

Hi,

I have two XQueries that do exactly the same thing, i.e. create a list of waters in the Facts database and display where these waters flow into, if any. But Query 1 executes 10 times faster than Query 2 (in the GUI). The only difference is in the selection of the query context. Query 1 assumes the Facts database is open and does not use an explicit doc() or collection() context. Query 2 uses an explicit collection("Facts") context

Query 1 executes in appr. 600 msecs, Query 2 in appr. 60 msecs (!)

How can there be such a huge difference? Is it because Query 2 operates twice on the $facts variable and does not need to evaluate that again?

Listings:

Query1

(: list waters and where they stream to (if any):) for $source in //(sea|river|lake) let $toId := $source/to/@water let $to := (//sea|//river|//lake)[@id=$toId][1] let $name := if (empty($to/local-name())) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

Query2

(: list waters and where they stream to (if any):) let $facts := collection("Facts")//(sea|river|lake ) for $source in $facts let $toId := $source/to/@water let $to := $facts[@id=$toId][1] let $name := if (empty($to/local-name())) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

As a side-question: I want to extend the query to make it recursive: river "Bahr el-Djebel" streams into river "White Nile" streams into river "Nile" streams into sea "Mediterranean Sea" I think I can find out how to do that, but how can I optimize the recursion process? Would a recursive function be efficient?

Thanks,

Paul

Attachments:

attachment.html (text/html — 22.3 KB)

Show replies by date

H. Verweij

4 Aug 4 Aug

4:44 a.m.

Paul Swennenhuis

5:21 a.m.

Hi Huub,

Thank you for your reply. I tried your suggestions, but it does not make any difference. I changed Query 1 to this:

(: list waters and where they stream to (if any):) for $source in //(sea|river|lake) let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

but there is no performance gain. The query still executes at least 10 times slower than Query 2.

Thanks for the empty($to) suggestion.

As for the recursive algorithm: in the meantime I wrote the query for that and it works like a charm!

Paul

...

Hi Paul,

...
Op 4 aug. 2014, om 09:27 heeft Paul Swennenhuis < paul@swennenhuis.nl mailto:paul@swennenhuis.nl > het volgende geschreven: Listings:

Query1

(: list waters and where they stream to (if any):) for $source in //(sea|river|lake) let $toId := $source/to/@water let $to := (//sea|//river|//lake)[@id=$toId][1]

You start to search for “sea” elements at the very top of the db, then, for “river” elements you start to search at the very top of the db, then, for “lake” elements you start to search at the very top of the db. And you do this for every $source you process. This is different from "//(sea|river|lake)" where you start at the top (once) and then match sea, river or lake elements. In the second query you find all sea, river and lake elements once and then use that sequence to search in, that would be (much) faster.

It might even be faster to just search all element and filter on @id (BaseX can then use the attribute index and just needs to use it once, probably), f.i.:

let $toWaters := //*[@id = $toId]

and $toWaters contains all waters (sea, river and lake elements) the $source streams to. Add a [1] if you just need the first one like you did. (If you know that all waters mentioned in $source/to exist in your db, wouldn't it be better to restrict $toId instead of $to, i.e. just use the first $source/to element?)

...
let $name := if (empty($to/local-name())) then “none” else $to/local-name()

I am not sure I understand but wouldn't empty($to) do the trick?

...
return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

As a side-question: I want to extend the query to make it recursive: river “Bahr el-Djebel” streams into river “White Nile” streams into river “Nile” streams into sea “Mediterranean Sea” I think I can find out how to do that, but how can I optimize the recursion process? Would a recursive function be efficient?

Yes, that would do the trick. Generally, tail-recursiveness is a good thing, but in this case it wouldn't matter much probably. Just watch out for those weird rivers that flow back into the lake they originate from ;-).

Regards,

Huib Verweij.

Fabrice Etanchaud

5:35 a.m.

Dear Paul,

Is it a big collection ? Could the difference be in opening the collection ? Did you try to run the slow request for example in the GUI, with the collection already opened ?

Best regards, Fabrice

De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Paul Swennenhuis Envoyé : lundi 4 août 2014 11:22 À : H. Verweij; BaseX Objet : Re: [basex-talk] Same query, huge difference in performance

Hi Huub,

Thank you for your reply. I tried your suggestions, but it does not make any difference. I changed Query 1 to this:

but there is no performance gain. The query still executes at least 10 times slower than Query 2.

Thanks for the empty($to) suggestion.

As for the recursive algorithm: in the meantime I wrote the query for that and it works like a charm!

Paul

Hi Paul, Op 4 aug. 2014, om 09:27 heeft Paul Swennenhuis < paul@swennenhuis.nlmailto:paul@swennenhuis.nl > het volgende geschreven: Listings:

Query1

(: list waters and where they stream to (if any):) for $source in //(sea|river|lake) let $toId := $source/to/@water let $to := (//sea|//river|//lake)[@id=$toId][1]

You start to search for “sea” elements at the very top of the db, then, for “river” elements you start to search at the very top of the db, then, for “lake” elements you start to search at the very top of the db. And you do this for every $source you process. This is different from "//(sea|river|lake)" where you start at the top (once) and then match sea, river or lake elements. In the second query you find all sea, river and lake elements once and then use that sequence to search in, that would be (much) faster.

It might even be faster to just search all element and filter on @id (BaseX can then use the attribute index and just needs to use it once, probably), f.i.:

let $toWaters := //*[@id = $toId]

and $toWaters contains all waters (sea, river and lake elements) the $source streams to. Add a [1] if you just need the first one like you did. (If you know that all waters mentioned in $source/to exist in your db, wouldn't it be better to restrict $toId instead of $to, i.e. just use the first $source/to element?)

let $name := if (empty($to/local-name())) then “none” else $to/local-name()

I am not sure I understand but wouldn't empty($to) do the trick?

return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

As a side-question: I want to extend the query to make it recursive: river “Bahr el-Djebel” streams into river “White Nile” streams into river “Nile” streams into sea “Mediterranean Sea” I think I can find out how to do that, but how can I optimize the recursion process? Would a recursive function be efficient?

Yes, that would do the trick. Generally, tail-recursiveness is a good thing, but in this case it wouldn't matter much probably. Just watch out for those weird rivers that flow back into the lake they originate from ;-).

Regards,

Huib Verweij.

Paul Swennenhuis

5:40 a.m.

Hi Fabrice,

Thanks for your contribution. The collection is the Facts database (factbook.xml) found in the distribution of BaseX. It's the same collection as used in Query 2. And yes, I did try to run the slow query in the GUI. In fact, that is the only place where I ran it, with the Facts database opened (it will yield an error if the database is not opened since it does not specify a context).

Paul

On 8/4/2014 11:35 AM, Fabrice Etanchaud wrote:

...

Dear Paul,

Is it a big collection ? Could the difference be in opening the collection ?

Did you try to run the slow request for example in the GUI, with the collection already opened ?

Best regards,

Fabrice

*De :*basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] *De la part de* Paul Swennenhuis *Envoyé :* lundi 4 août 2014 11:22 *À :* H. Verweij; BaseX *Objet :* Re: [basex-talk] Same query, huge difference in performance

Hi Huub,

Thank you for your reply. I tried your suggestions, but it does not make any difference. I changed Query 1 to this:

(: list waters and where they stream to (if any):) for $source in //(sea|river|lake) let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

but there is no performance gain. The query still executes at least 10 times slower than Query 2.

Thanks for the empty($to) suggestion.

As for the recursive algorithm: in the meantime I wrote the query for that and it works like a charm!

Paul
Hi Paul,

    Op 4 aug. 2014, om 09:27 heeft Paul Swennenhuis <
    paul@swennenhuis.nl <mailto:paul@swennenhuis.nl> > het
    volgende geschreven:
    Listings:

    Query1

    (: list waters and where they stream to (if any):)
    for $source in //(sea|river|lake)
    let $toId := $source/to/@water
    let $to := (//sea|//river|//lake)[@id=$toId][1]


You start to search for “sea” elements at the very top of the db,
then, for “river” elements you start to search at the very top of
the db, then, for “lake” elements you start to search at the very
top of the db. And you do this for every $source you process. This
is different from "//(sea|river|lake)" where you start at the top
(once) and then match sea, river or lake elements. In the second
query you find all sea, river and lake elements once and then use
that sequence to search in, that would be (much) faster.

It might even be faster to just search all element and filter on
@id (BaseX can then use the attribute index and just needs to use
it once, probably), f.i.:

let $toWaters := //*[@id = $toId]

and $toWaters contains all waters (sea, river and lake elements)
the $source streams to. Add a [1] if you just need the first one
like you did. (If you know that all waters mentioned in $source/to
exist in your db, wouldn't it be better to restrict $toId instead
of $to, i.e. just use the first $source/to element?)

    let $name := if (empty($to/local-name())) then “none” else
    $to/local-name()

I am not sure I understand but wouldn't empty($to) do the trick?

    return
    element water {
    element {$source/local-name()} {data($source/@name)},
    if (not($name="none"))then
    element streamsTo {
    attribute {$name} {data($to/@name)}
    }
    else ()
    }

    As a side-question: I want to extend the query to make it
    recursive: river “Bahr el-Djebel” streams into river “White
    Nile” streams into river “Nile” streams into sea
    “Mediterranean Sea”
    I think I can find out how to do that, but how can I optimize
    the recursion process? Would a recursive function be efficient?


Yes, that would do the trick. Generally, tail-recursiveness is a
good thing, but in this case it wouldn't matter much probably.
Just watch out for those weird rivers that flow back into the lake
they originate from ;-).

Regards,

Huib Verweij.

Christian Grün

5:41 a.m.

Hi Paul,

...

//(sea|river|lake)

Due to the (somewhat peculiar) semantics of XPath, this path is identical to...

/descendant-or-self::node()/ (child::sea | child::river | child::lake)

...and it creates a massive amount of intermediate results. You could try to rewrite it to...

/descendant::sea | /descendant::river | /descendant::lake

...or...

/descendant::*[local-name() = ('sea', 'river', 'lake')]

...and I will try to tweak our optimizer to automatically do this for you in future (it already works for single steps).

Christian

Paul Swennenhuis

5:57 a.m.

Hi Christian,

Sorry, also doesn't improve performance. I even tried to copy the optimized line for the selection, as found in the Query Info pane:

(: list waters and where they stream to (if any):) for $source in ((db:open-pre("facts",0)/descendant::*:sea union db:open-pre("facts",0)/descendant::*:river union db:open-pre("facts",0)/descendant::*:lake)) let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

No improvement. The problem seems to be in the line that assigns the $to variable If I reuse the main node selection there the query executes fast. Like such:

(: list waters and where they stream to (if any):) let $sources := /descendant::sea | /descendant::river | /descendant::lake for $source in $sources let $toId := $source/to/@water let $to := $sources[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

The original line, let $to := //*[@id=$toId][1], apparently is very expensive. I could do some testing with the profiling tools to see if I'm right.

Paul

...

Hi Paul,

...
//(sea|river|lake)

Due to the (somewhat peculiar) semantics of XPath, this path is identical to...

/descendant-or-self::node()/ (child::sea | child::river | child::lake)

...and it creates a massive amount of intermediate results. You could try to rewrite it to...

/descendant::sea | /descendant::river | /descendant::lake

...or...

/descendant::*[local-name() = ('sea', 'river', 'lake')]

...and I will try to tweak our optimizer to automatically do this for you in future (it already works for single steps).

Christian

Christian Grün

6:04 a.m.

Hi Paul,

thanks for your feedback. Are you working with 7.9? If it's not too much of a hassle for you, I would be interested to hear if you get better performance with the latest 8.0 snapshot?

Christian

[1] http://files.basex.org/releases/latest/

On Mon, Aug 4, 2014 at 11:57 AM, Paul Swennenhuis paul@swennenhuis.nl wrote:

...

Hi Christian,

Sorry, also doesn't improve performance. I even tried to copy the optimized line for the selection, as found in the Query Info pane:

(: list waters and where they stream to (if any):) for $source in ((db:open-pre("facts",0)/descendant::*:sea union db:open-pre("facts",0)/descendant::*:river union db:open-pre("facts",0)/descendant::*:lake))

let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

No improvement. The problem seems to be in the line that assigns the $to variable If I reuse the main node selection there the query executes fast. Like such:

(: list waters and where they stream to (if any):) let $sources := /descendant::sea | /descendant::river | /descendant::lake for $source in $sources let $toId := $source/to/@water let $to := $sources[@id=$toId][1]

let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

The original line, let $to := //*[@id=$toId][1], apparently is very expensive. I could do some testing with the profiling tools to see if I'm right.

Paul

...
Hi Paul,

...
//(sea|river|lake)

Due to the (somewhat peculiar) semantics of XPath, this path is identical to...

/descendant-or-self::node()/ (child::sea | child::river | child::lake)

...and it creates a massive amount of intermediate results. You could try to rewrite it to...

/descendant::sea | /descendant::river | /descendant::lake

...or...

/descendant::*[local-name() = ('sea', 'river', 'lake')]

...and I will try to tweak our optimizer to automatically do this for you in future (it already works for single steps).

Christian

Paul Swennenhuis

6:40 a.m.

Hi Christian,

I will try that. But first: I can confirm my suspicion. The offending line ( let $to := //*[@id=$toId][1] ) takes about 20 msecs per hit , and since there are 249 hits, that means 249 * 20 = 4980 msecs in total, almost 5 seconds!

On a side note: I discovered that BaseX's query optimization is working too good :-) I wanted to profile the execution time of that offending line, so I assigned the current time to variable $start before that line, and I assigned the current time to variable $end after that line:

let $start := (current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S') let $to := //*[@id=$toId][1] let $end := (current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')

And then I assigned the difference, $end-$start to an attribute in the result fragment. But it appeared that BaseX pre-evaluated the $start and $end variables and converted them into a constant, so I got the same $start and $end value in every result, and the difference was always 0.

The only way I saw to prevent that from happening was using xquery:eval, making it impossible for BaseX to pre-evaluate it:

let $start := xquery:eval("(current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')") let $to := //*[@id=$toId][1] let $end := xquery:eval("(current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')"

The complete profiling query:

(: list waters and where they stream to (if any):) for $source in /descendant::sea | /descendant::river | /descendant::lake let $toId := $source/to/@water let $start := xquery:eval("(current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')") let $to := //*[@id=$toId][1] let $end := xquery:eval("(current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')") let $name := if (empty($to)) then "none" else $to/local-name() return element water { attribute took {$end - $start}, element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

For me the lesson is: uses as much predefined selections as possible, particularly in "for" clauses.

Paul

...

Hi Paul,

thanks for your feedback. Are you working with 7.9? If it's not too much of a hassle for you, I would be interested to hear if you get better performance with the latest 8.0 snapshot?

Christian

[1] http://files.basex.org/releases/latest/

On Mon, Aug 4, 2014 at 11:57 AM, Paul Swennenhuis paul@swennenhuis.nl wrote:

...
Hi Christian,

Sorry, also doesn't improve performance. I even tried to copy the optimized line for the selection, as found in the Query Info pane:

(: list waters and where they stream to (if any):) for $source in ((db:open-pre("facts",0)/descendant::*:sea union db:open-pre("facts",0)/descendant::*:river union db:open-pre("facts",0)/descendant::*:lake))

let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

No improvement. The problem seems to be in the line that assigns the $to variable If I reuse the main node selection there the query executes fast. Like such:

(: list waters and where they stream to (if any):) let $sources := /descendant::sea | /descendant::river | /descendant::lake for $source in $sources let $toId := $source/to/@water let $to := $sources[@id=$toId][1]

let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

The original line, let $to := //*[@id=$toId][1], apparently is very expensive. I could do some testing with the profiling tools to see if I'm right.

Paul

...
Hi Paul,

...
//(sea|river|lake)

Due to the (somewhat peculiar) semantics of XPath, this path is identical to...
/descendant-or-self::node()/
  (child::sea | child::river | child::lake)
...and it creates a massive amount of intermediate results. You could try to rewrite it to...
/descendant::sea | /descendant::river |
  /descendant::lake
...or...
/descendant::*[local-name() = ('sea', 'river', 'lake')]
...and I will try to tweak our optimizer to automatically do this for you in future (it already works for single steps).

Christian

Christian Grün

7:08 a.m.

Hi Paul,

thanks for trying 8.0. I have just uploaded yet another snapshot that optimizes descendant-or-self axes and union & list expressions; e.g.:

//(sea, river) -> (/descendant::sea | /descendant::river )

There are various other query optimizations that will be available in 8.0. But as you already observed, this is probably not the bottleneck in your query.

...

The offending line ( let $to := //*[@id=$toId][1] ) takes about 20 msecs per hit

It seems as if the index structures are not utilized here (you can open the InfoView in the GUI in order to see what's going on). You will probably get much better performance by using parentheses around the path expression:

(//*[@id=$toId])[1]

Please note that the two expressions are not equivalent: The second one will only give you 1 result whereas the first one may give you more than one result, because it's equivalent to:

/descendant-or-self::node()/child::*[@id=$toId][1]

The reason is that the two predicates belongs to the child step and not the full path expression.

...

let $start := (current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')

Due to the functional nature of XQuery, all calls of current-dateTime() will give you the same result during the execution of a query. But there is (at least) one way out: You can try prof:current-ns() instead [1].

By the way, here is one more variant of your query, which explicitly accesses the index structures (however, this version of BaseX-specific and not that nice to read anymore):

for $source in collection("Facts")/ (descendant::sea | descendant::river | descendant::lake) return element water { element {$source/local-name()} {data($source/@name)}, for $to in (db:text('Facts', $source/to/@water)/ (parent::sea | parent::river | parent::lake))[1] return element streamsTo { attribute {$to/local-name()} {data($to/@name)} } }

Hope this helps; feel free to ask for more details, Christian

[1] http://docs.basex.org/wiki/Profiling_Module

Paul Swennenhuis

7:25 a.m.

Hi Christian,

Thanks again for your reply. I'm learning a lot by discussing these kind of problems.

I tested your variant. It does indeed perform a lot better. My Query 2 still beats it though, so I'll stick to that one for now.

Paul

...

Hi Paul,

thanks for trying 8.0. I have just uploaded yet another snapshot that optimizes descendant-or-self axes and union & list expressions; e.g.:
//(sea, river)
-> (/descendant::sea | /descendant::river )

There are various other query optimizations that will be available in 8.0. But as you already observed, this is probably not the bottleneck in your query.

...
The offending line ( let $to := //*[@id=$toId][1] ) takes about 20 msecs per hit

It seems as if the index structures are not utilized here (you can open the InfoView in the GUI in order to see what's going on). You will probably get much better performance by using parentheses around the path expression:

(//*[@id=$toId])[1]

Please note that the two expressions are not equivalent: The second one will only give you 1 result whereas the first one may give you more than one result, because it's equivalent to:

/descendant-or-self::node()/child::*[@id=$toId][1]

The reason is that the two predicates belongs to the child step and not the full path expression.

...
let $start := (current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')

Due to the functional nature of XQuery, all calls of current-dateTime() will give you the same result during the execution of a query. But there is (at least) one way out: You can try prof:current-ns() instead [1].

By the way, here is one more variant of your query, which explicitly accesses the index structures (however, this version of BaseX-specific and not that nice to read anymore):

for $source in collection("Facts")/ (descendant::sea | descendant::river | descendant::lake) return element water { element {$source/local-name()} {data($source/@name)}, for $to in (db:text('Facts', $source/to/@water)/ (parent::sea | parent::river | parent::lake))[1] return element streamsTo { attribute {$to/local-name()} {data($to/@name)} } }

Hope this helps; feel free to ask for more details, Christian

[1] http://docs.basex.org/wiki/Profiling_Module

Paul Swennenhuis

5 Aug 5 Aug

7:41 a.m.

Hi Christian,

I saw that I still had my BaseX session open and wanted to close all open editor windows, then decided to run the slow query again, modified with your "parentheses around the path expression" solution. And guess what, it now runs as fast as Query 2! With parentheses: 80 msecs Without parentheses: 5300 msecs I seriously was under the assumption that I had tried it yesterday but apparently I had not. Anyway, it works. Thanks.

Paul

...

Hi Paul,

thanks for trying 8.0. I have just uploaded yet another snapshot that optimizes descendant-or-self axes and union & list expressions; e.g.:
//(sea, river)
-> (/descendant::sea | /descendant::river )

There are various other query optimizations that will be available in 8.0. But as you already observed, this is probably not the bottleneck in your query.

...
The offending line ( let $to := //*[@id=$toId][1] ) takes about 20 msecs per hit

It seems as if the index structures are not utilized here (you can open the InfoView in the GUI in order to see what's going on). You will probably get much better performance by using parentheses around the path expression:

(//*[@id=$toId])[1]

Please note that the two expressions are not equivalent: The second one will only give you 1 result whereas the first one may give you more than one result, because it's equivalent to:

/descendant-or-self::node()/child::*[@id=$toId][1]

The reason is that the two predicates belongs to the child step and not the full path expression.

...
let $start := (current-dateTime() - xs:dateTime('1970-01-01T00:00:00-00:00')) div xs:dayTimeDuration('PT0.001S')

Due to the functional nature of XQuery, all calls of current-dateTime() will give you the same result during the execution of a query. But there is (at least) one way out: You can try prof:current-ns() instead [1].

By the way, here is one more variant of your query, which explicitly accesses the index structures (however, this version of BaseX-specific and not that nice to read anymore):

for $source in collection("Facts")/ (descendant::sea | descendant::river | descendant::lake) return element water { element {$source/local-name()} {data($source/@name)}, for $to in (db:text('Facts', $source/to/@water)/ (parent::sea | parent::river | parent::lake))[1] return element streamsTo { attribute {$to/local-name()} {data($to/@name)} } }

Hope this helps; feel free to ask for more details, Christian

[1] http://docs.basex.org/wiki/Profiling_Module

Paul Swennenhuis

4 Aug 4 Aug

6:52 a.m.

Hi Christian,

Just tried the query in BaseX 8. Sorry, no improvement in performance.

Paul

...

Hi Paul,

thanks for your feedback. Are you working with 7.9? If it's not too much of a hassle for you, I would be interested to hear if you get better performance with the latest 8.0 snapshot?

Christian

[1] http://files.basex.org/releases/latest/

On Mon, Aug 4, 2014 at 11:57 AM, Paul Swennenhuis paul@swennenhuis.nl wrote:

...
Hi Christian,

Sorry, also doesn't improve performance. I even tried to copy the optimized line for the selection, as found in the Query Info pane:

(: list waters and where they stream to (if any):) for $source in ((db:open-pre("facts",0)/descendant::*:sea union db:open-pre("facts",0)/descendant::*:river union db:open-pre("facts",0)/descendant::*:lake))

let $toId := $source/to/@water let $to := //*[@id=$toId][1] let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

No improvement. The problem seems to be in the line that assigns the $to variable If I reuse the main node selection there the query executes fast. Like such:

(: list waters and where they stream to (if any):) let $sources := /descendant::sea | /descendant::river | /descendant::lake for $source in $sources let $toId := $source/to/@water let $to := $sources[@id=$toId][1]

let $name := if (empty($to)) then "none" else $to/local-name() return element water { element {$source/local-name()} {data($source/@name)}, if (not($name="none"))then element streamsTo { attribute {$name} {data($to/@name)} } else () }

The original line, let $to := //*[@id=$toId][1], apparently is very expensive. I could do some testing with the profiling tools to see if I'm right.

Paul

...
Hi Paul,

...
//(sea|river|lake)

Due to the (somewhat peculiar) semantics of XPath, this path is identical to...
/descendant-or-self::node()/
  (child::sea | child::river | child::lake)
...and it creates a massive amount of intermediate results. You could try to rewrite it to...
/descendant::sea | /descendant::river |
  /descendant::lake
...or...
/descendant::*[local-name() = ('sea', 'river', 'lake')]
...and I will try to tweak our optimizer to automatically do this for you in future (it already works for single steps).

Christian

4000

Age (days ago)

4001

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

12 comments

4 participants

tags (0)

participants (4)

Christian Grün
Fabrice Etanchaud
H. Verweij
Paul Swennenhuis