Hi,
First of all, thank you for the excellent software you produce and maintain! Keep up the good work.
I've been using BaseX for some academic experiments on XQuery processing, and I got this situation that you guys can probably explain.
Here is some context: - I am using version 8.2.3. - Database 'expdb' was created with default options of that version, using 'auction.xml' document generated from Xmark benchmark [1]. - BaseX is running with default options too. - What the query does is irrelevant.
When I execute this query:
for $pe in doc('expdb/auction.xml')/site/people/person for $cat in doc('expdb/auction.xml')/site/categories/category[position() >= 1 and position() < 101] where count($pe/profile/interest) > 3 and $pe/profile/interest/@category = $cat/@id return <match> <person>{$pe/name}</person> <category>{$cat/name}</category> </match>
the resulting optimized query (in 'Query Info' window, on GUI) is this:
for $pe_0 in *db:open-pre*("expdb",0)/*:site/*:people/*:person[3.0 < count(*:profile/*:interest)] for $cat_1 in *db:open-pre*("expdb",0)/*:site/*:categories/*:category[position() = 1 to 100][(@*:id = $pe_0/*:profile/*:interest/@*:category)] return element match { (element person { ($pe_0/*:name) }, element category { ($cat_1/*:name) }) }
If I change the original query to (note that I am only switching the position of 'for' clauses):
for $cat in doc('expdb/auction.xml')/site/categories/category[position() >= 1 and position() < 101] for $pe in doc('expdb/auction.xml')/site/people/person where count($pe/profile/interest) > 3 and $pe/profile/interest/@category = $cat/@id return <match> <person>{$pe/name}</person> <category>{$cat/name}</category> </match>
the optimized query changes to:
for $cat_0 in *db:open-pre*("expdb",0)/*:site/*:categories/*:category[position() = 1 to 100] for $pe_1 in *db:attribute*("expdb", $cat_0/@*:id)/self::*:category/parent::*:interest/parent::*:profile/parent::*:person[3.0 < count(*:profile/*:interest)] return element match { (element person { ($pe_1/*:name) }, element category { ($cat_0/*:name) }) }
You can see that BaseX was able to use attribute index to optimize last query, reducing its execution time (by a lot!).
My question is: what is the explanation for this behavior?
Thank you in advance!
Gabriel Tessarolli
--- [1] http://www.xml-benchmark.org/
Hi Gabriel,
In your first query, the values bound to the variable in the second for clause will be limited to the first 100 results. This is the reason why this path expression cannot be rewritten for index access. Things look different in the second query, in which the second for clause can be rewritten (because there is no positional limitation).
Hope this helps, Christian
for $cat in doc('expdb/auction.xml')/site/categories/category[position() >= 1 and position() < 101] for $pe in doc('expdb/auction.xml')/site/people/person where count($pe/profile/interest) > 3 and $pe/profile/interest/@category = $cat/@id return <match> <person>{$pe/name}</person> <category>{$cat/name}</category> </match>
the optimized query changes to:
for $cat_0 in db:open-pre("expdb",0)/*:site/*:categories/*:category[position() = 1 to 100] for $pe_1 in db:attribute("expdb", $cat_0/@*:id)/self::*:category/parent::*:interest/parent::*:profile/parent::*:person[3.0 < count(*:profile/*:interest)] return element match { (element person { ($pe_1/*:name) }, element category { ($cat_0/*:name) }) }
You can see that BaseX was able to use attribute index to optimize last query, reducing its execution time (by a lot!).
My question is: what is the explanation for this behavior?
Thank you in advance!
Gabriel Tessarolli
It does help!
Thank you, Gabriel
On Tue, Mar 29, 2016 at 2:04 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Gabriel,
In your first query, the values bound to the variable in the second for clause will be limited to the first 100 results. This is the reason why this path expression cannot be rewritten for index access. Things look different in the second query, in which the second for clause can be rewritten (because there is no positional limitation).
Hope this helps, Christian
for $cat in doc('expdb/auction.xml')/site/categories/category[position() = 1 and position() < 101] for $pe in doc('expdb/auction.xml')/site/people/person where count($pe/profile/interest) > 3 and $pe/profile/interest/@category
=
$cat/@id return <match> <person>{$pe/name}</person> <category>{$cat/name}</category> </match>
the optimized query changes to:
for $cat_0 in db:open-pre("expdb",0)/*:site/*:categories/*:category[position() = 1 to
100]
for $pe_1 in db:attribute("expdb",
$cat_0/@*:id)/self::*:category/parent::*:interest/parent::*:profile/parent::*:person[3.0
< count(*:profile/*:interest)] return element match { (element person { ($pe_1/*:name) }, element
category
{ ($cat_0/*:name) }) }
You can see that BaseX was able to use attribute index to optimize last query, reducing its execution time (by a lot!).
My question is: what is the explanation for this behavior?
Thank you in advance!
Gabriel Tessarolli
basex-talk@mailman.uni-konstanz.de