Hi Gioele,
I just want to add a quick clarification as I feel like some stuff got mixed up here (Christian may correct me if his version is backed by some compiler-voodoo that I’m neglecting, as he definitely knows more about the matter).
In the end, the performance depends strongly on your document structure. The ‘following’ version is expensive if the resulting node(s) of '//*[@xml:id = "lemma-aMSa”]’ have a lot of following nodes (siblings included). If there are lots of preceding nodes, the ‘preceding’ query version could be more expensive.
Point being, evaluation of the following axis is not exactly expensive in BaseX, at least compared to the preceding axis (also mind, that we have a reference to the first following node of a node, but not to the preceding node). Arriving at a conclusion about axis evaluation performance cannot be based on the two given queries (as they are non-equivalent).
Hope this doesn’t add to the confusion, though I think it does - Lukas
On 05 Feb 2015, at 14:23, Christian Grün christian.gruen@gmail.com wrote:
Hi Gioele,
I can confirm that the following axis is pretty expensive in BaseX, as we do not store explicit sibling references. The preceding axis is cheaper as we can stop search as soon as we traverse over the node we started from.
One way out is to first access the following nodes in your document and move the preceding node check in a predicate.
Also, I also get a warning about «'following::*[(self::tei:entry or self::tei:re)][(fn:position() <= 3)]' will never yield results.» but that is obviously false, as it yields exactly the 3 results I expect.
That's surprising indeed. Yes, feel free to send me your XML document in private.
Hope this helps, Christian
On Thu, Feb 5, 2015 at 2:12 PM, Gioele Barabucci gioele@svario.it wrote:
Hello,
I have noticed that this query using the "following" axes
//*[@xml:id = "lemma-aMSa"] /following::*[self::tei:entry or self::tei:re] [position() <= 3]
is much slower than the same query with the "preceding" axes
//*[@xml:id = "lemma-aMSa"] /preceding::*[self::tei:entry or self::tei:re] [position() <= 3]
The query that uses "preceding" takes about 2.5 ms to execute, while the one using "following" takes about 250 ms: it is 100 times slower.
Why this discrepancy between these two queries?
I can provide the base XML file (19MB) on request.
Also, I also get a warning about «'following::*[(self::tei:entry or self::tei:re)][(fn:position() <= 3)]' will never yield results.» but that is obviously false, as it yields exactly the 3 results I expect.
Regards,
-- Gioele Barabucci gioele@svario.it