For testing, I edited out the namespace decl. in the root element to remove that issue, and performed another test:
//siteinfo
I find it very disturbing that it still took 47 seconds to retrieve the single element, and that this:
/mediawiki/siteinfo
still took about 8 seconds.
Granted, most documents are *much* smaller than the entirety of current English Wikipedia pages, but it's a very valid use case and I think BaseX should dominate the world of XML databases.
I have several thoughts, but I think this may be the key issue:
On Sun, Feb 27, 2011 at 5:42 PM, Christian Grün christian.gruen@gmail.comwrote:
we're not saving direct references to the target nodes (as such an index would get very large for e.g. the Wikipedia page element),
I believe this premise should be challenged. Surely the performance issue above would be solved if indeed there were more specific indices to particular elements.
I don't think it's unreasonable to at least permit such an index to be created optionally. After all, consider text indexing. For most documents, the text itself is comprised of many more words than XML elements, yet that is considered a common and valid index to form. Why would 'page' as a textual word be more "indexable" than the <page> element?