Re: [basex-talk] basex xquery optimization not using attribute index on "joins"?

6 Apr 2010


      Sandra,
I'm glad to tell you that we have put some additional work into our
query optimizer. Your query, which was using as index terms, should
now be recognized by the compiler and evaluated by the index. You are
welcome to check out the latest sources from our repository (note that
the current code is still at a beta stage, so any feedback is more
than welcome).
Hope this helps,
Christian
___________________________
Christian Gruen
Universitaet Konstanz
Department of Computer & Information Science
D-78457 Konstanz, Germany
Tel: +49 (0)7531/88-4449, Fax: +49 (0)7531/88-3577
http://www.inf.uni-konstanz.de/~gruen
On Sun, Mar 28, 2010 at 5:30 AM, Christian Grün
christian.gruen@gmail.com wrote:
...
Sandra,
yes, it feels necessary to put some additional work in the optimizer
to support queries like the one you detailed. In a nutshell, we'll
think about a generic way to rewrite the index methods to also support
arguments other than strings and atomic values (...such as variables,
or item sequences). While some optimizations look quite obvious on
paper, it has turned out in the past that they mean a lot of work in
the final compilation steps, as XQuery is much more flexible than e.g.
SQL, or strictly typed lanaguges. Still… be sure your concerns are not
in vain.
Feedback is always welcome,
Christian
On Sat, Mar 27, 2010 at 11:43 PM, Sandra Maria Silcot
ssilcot@unimelb.edu.au wrote:
...
Christian,
Your suggestion to add "let $person := //person" does speed things up
considerably, down from a minute to about 10 seconds. I understand the
reasons for your optimization logic. For example, when I tried to halve
the no. of //person elements scanned, limiting it to males:
let $person := //person[@sex="M"]
the overhead of that check slightly increased the time needed!
However, is a large xml database and wanting to "join" documents using
what are in effect unique keys such a "special case"? I am wondering if a
different decision could be applied by the optimiser, fairly simply, based
on db size (say the target indexed attribute or element occurs > 10,000
times)?
I would also suggest that a useful (necessary?) optimisation enhancement
is that indexes should always be used when xml:id / xml:idref attributes
are involved, because these must always represent unique identifiers for
well formed xml. I modified my query slightly to try and target elements
on their xml:id ...
let $person:= //sources/*
 for $rs in //rs[@corresp="om22451"]/../..//rs
 let $keyval := data($rs/@corresp)
 return $person[@xml:id=$keyval]
And got the same time, about 10 seconds. FYI, here is the query info result:
Result: let $person := root()/descendant::*:sources/* for $rs in
IndexAccess(ATV,"om22451")/self::*:corresp/parent::*:rs/../../descendant::*:rs
let $keyval := data($rs/@*:corresp) return $person[@xml:id = $keyval]
I concede that in 4-6 years, Moores law will get this query down to 2-3
seconds, but in that time, the database may will have grown similarly!
Btw, running BaseX6.jar on a 1.8MHz Core2Duo 2.5gb ram, assigning the JVM
1024M, just an average kind of machine, but not too far short of our
server's cpu speed.
Thanks for your reply. I have one other semi-related question regards how
to address the separate documents in the db, but I'll post separately on
that.
Many thanks again.
Sandra.
...
Sandra,
thanks for your comprehensive analysis. It's true, the BaseX query
compiler optimises only static equality comparisons. If a dynamic
variable is embedded in a predicate, we would have to decide in
...
runtime if we want to apply the index, or not. The main reason why we
don't use runtime optimisations here is that there are many cases in which
sequential executions turn out to be faster (e.g. if the path to a
predicate is cheap), and it's difficult to decide which variant will yield
faster results. In your special case, however, it would seem quite obvious
that the index would be preferable.
...
Apart from that, you may try to speed up your given query by putting the
//person into a variable:
...
let $person := //person
 for $rs in //rs[@corresp="c31a31061000"]/../..//rs
 let $keyval := data($rs/@corresp)
 return $person[@key=$keyval]
By the way, the use of the eval() method is an interesting
(implementation specific) option which didn't come to my mind before�
Feel free to ask for more,
Christian
On Sat, Mar 27, 2010 at 2:28 AM, Sandra Maria Silcot
ssilcot@unimelb.edu.au wrote:
...
Hi all,
First, thanks to the developers for a great piece of software.
I am having difficulty getting an xquery on a large database to run
using
...
...
indexed attributes when a "join" idiom is used. I have a large basex
database, with multiple documents. One of those contains <rs> elements as
...
...
shown below, where the @corresp attribute contains values which are
identical to the @key attribute on <person> elements, which live in
multiple separate files:
...
...
<personGrp type="match:policeNum+ship" size="3"><persName>
<rs corresp="c23a2866">Corper, Jno (Pn:1000C)...</rs>
<rs corresp="dlm18192024">Corper, John (Pn:1000C)...</rs>
<rs corresp="c31a31061000" >Corper, Jno (Pn:1000CC)...</rs>
</persName></personGrp>
I know indexes have been built as xpaths like this are lightening
quick:
...
...
//person[@key="c23a2866"]
or
//rs[@corresp="dlm18192024"]
But when I do this, it is glacial (nearly 1 minute):
EG(A)
for $rs in //rs[@corresp="c31a31061000"]/../..//rs
let $keyval := data($rs/@corresp)
return //person[@key=$keyval]
I am using BaseX6.jar on XP. When I look at the query plan, the ONLY
time
...
...
the attribute index used is on the //rs[@corresp="c31a31061000"] part.
I can get it to run fast and return the 3 matched <person> elements using
...
...
the attribute index using basex:eval, like this:
EG(B)
for $rs in data(//rs[@corresp="c31a31061000"]/../..//rs/@corresp) let
$s := concat("//person[@key='",$rs,"']")
...
...
return basex:eval($s)
So rather than execute this query asa "join" -- a manner which seem
widespread in the xquery world -- I must manually build and execute each
...
...
//person[@key='string'] request "manually" to get basex to use its
attribute index. Whilst this works, it seems a rather strange idiom to
have to employ, and locks my queries into basex.
...
...
Is the behaviour of EG(A) by design, or is it a bug that the query
optimizer is failing to recognise it can use the attribute index on the
//person[@key=$keyval] part?
...
...
Any guidance much appreciated.
Best wishes to all,
Sandra.

Christian Gruen
Universitaet Konstanz
Department of Computer & Information Science
D-78457 Konstanz, Germany
Tel: +49 (0)7531/88-4449, Fax: +49 (0)7531/88-3577
http://www.inf.uni-konstanz.de/~gruen

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] basex xquery optimization not using attribute index on "joins"?