Hi Gioele,
It's usually a difficult task for the query compiler to rewrite nested predicates. The following query may be evaluated faster (as I don't have access to your data, I couldn't test it):
declare namespace tei='http://www.tei-c.org/ns/1.0';
/descendant::tei:orth [text() = "arci"] [ancestor-or-self::* [@xml:lang][1][starts-with(@xml:lang, "san")] ] /parent::tei:form /(parent::tei:entry | parent::tei:re) [parent::tei:body/parent::tei:text/parent::TEI /parent::document-node()]
Depending on the structure of your data, it may be possible to simplify some of the predicates. As Fabrice suggested, you should check the query info output in order to see if the text index is utilized.
Christian
declare namespace tei='http://www.tei-c.org/ns/1.0'; /tei:TEI/tei:text/tei:body// *[self::tei:entry or self::tei:re] [./tei:form/tei:orth[. = "arci"] [ancestor-or-self::* [@xml:lang][1] [(starts-with(@xml:lang, "san"))] ] ]
In human terms is should return all the `tei:entry` or `tei:re` that
- have the word "arci" in their `/tei:form/tei:orth` element,
- their nearest `xml:lang` attribute starts with 'san'.
I made some tests and it turned out that the main culprit is the use of `//` in the first line. (_Main_ culprit, not the only one...)
I use the `//` axis because I do not know what is the structure of the underlying TEI file. I expect BaseX to keep track of all the `tei:entry` and `tei:re` elements and their parents, so selecting the correct ones should be quite fast anyway. But the measurements disagree with my assumptions...
What could I do to improve the performance of this query?
Now, some remarks based on some small tests I have done:
Removing the
[ancestor-or-self::*[....]]
predicate slashes the run time in half, but the query is still way too slow.
Changing
./tei:form/tei:orth[. = "arci"]
to
./tei:form[1]/tei:orth[1][. = "arci"]
makes the query even slower.
- changing `starts-with(@xml:lang, "san")` to `@xml:lang = 'san-xxx'` has a
negligible effect.
Dropping the `[1]` from
[@xml:lang][1]
makes the whole query twice as fast.
Regards,
-- Gioele Barabucci gioele@svario.it