Hi,
First of all, thank you for the excellent software you produce and maintain! Keep up the good work.
I've been using BaseX for some academic experiments on XQuery processing, and I got this situation that you guys can probably explain.
Here is some context: - I am using version 8.2.3. - Database 'expdb' was created with default options of that version, using 'auction.xml' document generated from Xmark benchmark [1]. - BaseX is running with default options too. - What the query does is irrelevant.
When I execute this query:
for $pe in doc('expdb/auction.xml')/site/people/person for $cat in doc('expdb/auction.xml')/site/categories/category[position() >= 1 and position() < 101] where count($pe/profile/interest) > 3 and $pe/profile/interest/@category = $cat/@id return <match> <person>{$pe/name}</person> <category>{$cat/name}</category> </match>
the resulting optimized query (in 'Query Info' window, on GUI) is this:
for $pe_0 in *db:open-pre*("expdb",0)/*:site/*:people/*:person[3.0 < count(*:profile/*:interest)] for $cat_1 in *db:open-pre*("expdb",0)/*:site/*:categories/*:category[position() = 1 to 100][(@*:id = $pe_0/*:profile/*:interest/@*:category)] return element match { (element person { ($pe_0/*:name) }, element category { ($cat_1/*:name) }) }
If I change the original query to (note that I am only switching the position of 'for' clauses):
for $cat in doc('expdb/auction.xml')/site/categories/category[position() >= 1 and position() < 101] for $pe in doc('expdb/auction.xml')/site/people/person where count($pe/profile/interest) > 3 and $pe/profile/interest/@category = $cat/@id return <match> <person>{$pe/name}</person> <category>{$cat/name}</category> </match>
the optimized query changes to:
for $cat_0 in *db:open-pre*("expdb",0)/*:site/*:categories/*:category[position() = 1 to 100] for $pe_1 in *db:attribute*("expdb", $cat_0/@*:id)/self::*:category/parent::*:interest/parent::*:profile/parent::*:person[3.0 < count(*:profile/*:interest)] return element match { (element person { ($pe_1/*:name) }, element category { ($cat_0/*:name) }) }
You can see that BaseX was able to use attribute index to optimize last query, reducing its execution time (by a lot!).
My question is: what is the explanation for this behavior?
Thank you in advance!
Gabriel Tessarolli
--- [1] http://www.xml-benchmark.org/