Sandra,
thanks for your comprehensive analysis. It's true, the BaseX query compiler optimises only static equality comparisons. If a dynamic variable is embedded in a predicate, we would have to decide in runtime if we want to apply the index, or not. The main reason why we don't use runtime optimisations here is that there are many cases in which sequential executions turn out to be faster (e.g. if the path to a predicate is cheap), and it's difficult to decide which variant will yield faster results. In your special case, however, it would seem quite obvious that the index would be preferable.
Apart from that, you may try to speed up your given query by putting the //person into a variable:
let $person := //person for $rs in //rs[@corresp="c31a31061000"]/../..//rs let $keyval := data($rs/@corresp) return $person[@key=$keyval]
By the way, the use of the eval() method is an interesting (implementation specific) option which didn't come to my mind before…
Feel free to ask for more, Christian
On Sat, Mar 27, 2010 at 2:28 AM, Sandra Maria Silcot ssilcot@unimelb.edu.au wrote:
Hi all,
First, thanks to the developers for a great piece of software.
I am having difficulty getting an xquery on a large database to run using indexed attributes when a "join" idiom is used. I have a large basex database, with multiple documents. One of those contains <rs> elements as shown below, where the @corresp attribute contains values which are identical to the @key attribute on <person> elements, which live in multiple separate files:
<personGrp type="match:policeNum+ship" size="3"><persName> <rs corresp="c23a2866">Corper, Jno (Pn:1000C)...</rs> <rs corresp="dlm18192024">Corper, John (Pn:1000C)...</rs> <rs corresp="c31a31061000" >Corper, Jno (Pn:1000CC)...</rs> </persName></personGrp>
I know indexes have been built as xpaths like this are lightening quick:
//person[@key="c23a2866"] or //rs[@corresp="dlm18192024"]
But when I do this, it is glacial (nearly 1 minute):
EG(A)
for $rs in //rs[@corresp="c31a31061000"]/../..//rs let $keyval := data($rs/@corresp) return //person[@key=$keyval]
I am using BaseX6.jar on XP. When I look at the query plan, the ONLY time the attribute index used is on the //rs[@corresp="c31a31061000"] part.
I can get it to run fast and return the 3 matched <person> elements using the attribute index using basex:eval, like this:
EG(B)
for $rs in data(//rs[@corresp="c31a31061000"]/../..//rs/@corresp) let $s := concat("//person[@key='",$rs,"']") return basex:eval($s)
So rather than execute this query asa "join" -- a manner which seem widespread in the xquery world -- I must manually build and execute each //person[@key='string'] request "manually" to get basex to use its attribute index. Whilst this works, it seems a rather strange idiom to have to employ, and locks my queries into basex.
Is the behaviour of EG(A) by design, or is it a bug that the query optimizer is failing to recognise it can use the attribute index on the //person[@key=$keyval] part?
Any guidance much appreciated.
Best wishes to all,
Sandra.
___________________________
Christian Gruen Universitaet Konstanz Department of Computer & Information Science D-78457 Konstanz, Germany Tel: +49 (0)7531/88-4449, Fax: +49 (0)7531/88-3577 http://www.inf.uni-konstanz.de/~gruen