Dear Constantine,
Am 20.10.2011 09:44, schrieb Constantine Hondros:
Of course, I am really interested in more useful queries such as:
//tns:poi[not(tns:id = preceding-sibling::tns:poi/tns:id)]
this query basically compiles to two nested loops, the outer one iterating over //tns:poi and the inner one over all preceding-siblings. So it has an O(n^2) runtime.
I guess what I am asking is ... while most XPath queries I run are quick to resolve, anything using preceding::, preceding-sibling:: or following::, following-sibling:: essentially runs too slowly to be workable - at least, on my sample dataset with about 100K sibling children of the root element.
Well, that's an extreme example, 100'000^2 is a huge number of nodes to traverse. I performed some tests with a) 10'000 sibling elements inside the root: <root> <item id="1"/> <item id="2"/> [...] <item id="10000"/> </root>
and the query //item[not(preceding-sibling::*)]
b) 10'000 nested elements: <root> <item id="1"> <item id="2"> [...] <item id="10000"/> [...] </item> </item> </root>
and the query //item[not(ancestor::*)]
The timings I get are almost identical, replacing "preceding" with "following" or "ancestor" with "descendant" doesn't change much.
So is this any sort of known issue? Should I be looking at any sort of configuration options? Is there any way to profile the execution of the Xpath?
My guess is that the number of nodes on the *-sibling-axes is usually much bigger than on the vertical axes, so an expensive algorithm gets unusable earlier.
To return to your example query, you could drastically reduce its runtime by only traversing the database once, comparing each node only to the nodes with unique tns:ids found before. I'll use a left fold [1][2]:
fold-left( function($seq, $x) { if($seq/tns:id = $x/tns:id) then $seq else ($seq, $x) }, (), //tns:poi )
If you're OK with using not-yet-standard XQuery extensions, a map [3] instead of a sequence can potentially be even faster:
let $map := fold-left( function($map, $x) { let $key := $x/tns:id return if(map:contains($map, $key)) then $map else map:new(($map, map:entry($key, $x))) }, map:new(), //tns:poi ) for $key in map:keys($map) return $map($key)
But this won't preserve the relative ordering of the results.
Hope that helped, Cheers Leo __________
[1] http://www.w3.org/TR/xpath-functions-30/#func-fold-left [2] http://en.wikipedia.org/wiki/Fold_%28higher-order_function%29 [3] http://docs.basex.org/wiki/Map_Functions