Hi Gioele,
I wonder if the presence of the namespace somehow confuses the optimizer.
Exactly, that’s the reason. For some historical reason (but not such a wise one, as most quoted “historical reasons” are), we decided to index the node names without considering the namespace URI. As a result, the index:element-names function will yield…
<entry count="2">xml</entry>
…for the following document:
<xml> <xml xmlns='uri'/> </xml>
For the same reason, various optimizations that are based on the database statistics will only get into effect if a document contains no, or at most one global, namespace declaration. In various cases, optimizations could still be made possible (e.g. if we know that the element/attribute names with and without namespace URIs are distinct), but that hasn’t been implemented so far.
Cheers, Christian
I was stressing the BaseX 8.6 planner/optimizer when I noticed that expressions like `count(//elem)` are not optimized at all, even though they are correctly indexed, as demonstrated by `index:element-names()`.
The current database is a 300 MB TEI document. All the elements are in the `http://www.tei-c.org/ns/1.0%60 namespace.
The following test case will report the correct number, but it will take a couple of seconds to run, instead of a few milliseconds.
declare namespace tei="http://www.tei-c.org/ns/1.0"; let $n := index:element-names("monier")[. = 're']/@count let $c := count(//tei:re) return <res><in-index>{$n}</in-index><in-doc>{$c}</in-doc></res>
I wonder if the presence of the namespace somehow confuses the optimizer. The same problem can be observed running the same test case with
declare default element namespace "http://www.tei-c.org/ns/1.0"; [...] let $c := count(//re)
Regards,
-- Gioele Barabucci gioele.barabucci@uni-koeln.de