Re: [basex-talk] Problem with Wikipedia database (or a more general namespace efficiency problem?)

28 Feb 2011


      For testing, I edited out the namespace decl. in the root element to remove
that issue, and performed another test:
//siteinfo
I find it very disturbing that it still took 47 seconds to retrieve the
single element, and that this:
/mediawiki/siteinfo
still took about 8 seconds.
Granted, most documents are *much* smaller than the entirety of current
English Wikipedia pages, but it's a very valid use case and I think BaseX
should dominate the world of XML databases.
I have several thoughts, but I think this may be the key issue:
On Sun, Feb 27, 2011 at 5:42 PM, Christian Grün
christian.gruen@gmail.comwrote:
...
we're not saving direct references to the
target nodes (as such an index would get very large for e.g. the
Wikipedia page element),
I believe this premise should be challenged. Surely the performance issue
above would be solved if indeed there were more specific indices to
particular elements.
I don't think it's unreasonable to at least permit such an index to be
created optionally. After all, consider text indexing. For most documents,
the text itself is comprised of many more words than XML elements, yet that
is considered a common and valid index to form. Why would 'page' as a
textual word be more "indexable" than the <page> element?

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Problem with Wikipedia database (or a more general namespace efficiency problem?)