Dear all at BaseX,
In a indexed database, Query info shows that a request like Count(/a/b/c) does not use the statistics data available with index:facets(). Could you please tell me if this is really the case, and if there is a way to tell BaseX to use it ?
Best regards, And thank you for your great XNDB !
Fabrice
Here is the query :
Query: declare namespace exch = 'http://www.epo.org/exchange';
count(/exch:exchange-documents/exch:exchange-document) Query plan: <QueryPlan> <FNAggr name="count(item)"> <IterPath> <DBNodeSeq size="38"> <DBNode name="DOCDB_RU" pre="0"/> <DBNode name="DOCDB_RU" pre="4198620"/> <DBNode name="DOCDB_RU" pre="7369614"/> <DBNode name="DOCDB_RU" pre="11083006"/> <DBNode name="DOCDB_RU" pre="15423603"/> </DBNodeSeq> <IterStep axis="child" test="exch:exchange-documents"/> <IterStep axis="child" test="exch:exchange-document"/> </IterPath> </FNAggr> </QueryPlan>
Here is the db:info() :
<database> <databaseproperties> <name>DOCDB_RU</name> <size>3391 MB</size> <nodes>135589551</nodes> <documents>38</documents> <binaries>0</binaries> <timestamp>2013-01-29-23-16-31</timestamp> </databaseproperties> <resourceproperties> <inputpath>C:/data/work/docdb/RU</inputpath> <timestamp>2012-12-28-06-53-45</timestamp> <encoding>UTF-8</encoding> <whitespacechopping>ON</whitespacechopping> </resourceproperties> <indexes> <uptodate>true</uptodate> <textindex>ON</textindex> <attributeindex>ON</attributeindex> <fulltextindex>OFF</fulltextindex> <updindex>ON</updindex> <maxcats>10000</maxcats> <maxlen>96</maxlen> </indexes> </database>
Hi Fabrice,
this may be due to the existence of namespaces in your document. Various optimizations will only be triggered on documents with no namespaces, or a single default namespace.
Hope this helps, Christian ___________________________
On Tue, Feb 5, 2013 at 12:00 PM, Fabrice Etanchaud fetanchaud@questel.com wrote:
Dear all at BaseX,
In a indexed database,
Query info shows that a request like Count(/a/b/c) does not use the statistics data available with index:facets().
Could you please tell me if this is really the case, and if there is a way to tell BaseX to use it ?
Best regards,
And thank you for your great XNDB !
Fabrice
Here is the query :
Query: declare namespace exch = 'http://www.epo.org/exchange';
count(/exch:exchange-documents/exch:exchange-document)
Query plan:
<QueryPlan>
<FNAggr name="count(item)">
<IterPath> <DBNodeSeq size="38"> <DBNode name="DOCDB_RU" pre="0"/> <DBNode name="DOCDB_RU" pre="4198620"/> <DBNode name="DOCDB_RU" pre="7369614"/> <DBNode name="DOCDB_RU" pre="11083006"/> <DBNode name="DOCDB_RU" pre="15423603"/> </DBNodeSeq> <IterStep axis="child" test="exch:exchange-documents"/> <IterStep axis="child" test="exch:exchange-document"/> </IterPath>
</FNAggr>
</QueryPlan>
Here is the db:info() :
<database>
<databaseproperties>
<name>DOCDB_RU</name> <size>3391 MB</size> <nodes>135589551</nodes> <documents>38</documents> <binaries>0</binaries> <timestamp>2013-01-29-23-16-31</timestamp>
</databaseproperties>
<resourceproperties>
<inputpath>C:/data/work/docdb/RU</inputpath> <timestamp>2012-12-28-06-53-45</timestamp> <encoding>UTF-8</encoding> <whitespacechopping>ON</whitespacechopping>
</resourceproperties>
<indexes>
<uptodate>true</uptodate> <textindex>ON</textindex> <attributeindex>ON</attributeindex> <fulltextindex>OFF</fulltextindex> <updindex>ON</updindex> <maxcats>10000</maxcats> <maxlen>96</maxlen>
</indexes>
</database>
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Dear Christian,
On Tue, Feb 5, 2013 at 5:51 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Fabrice,
this may be due to the existence of namespaces in your document. Various optimizations will only be triggered on documents with no namespaces, or a single default namespace.
This is interesting and potentially troubling, since I'm working with data which (for no good reason I can see but I didn't design it) declares a namespace with a prefix at the top level, places the document element in this namespace, and then proceeds on with every element in the document binding to no namespace. As in:
<blah:document xmlns:blah="blah.com"> <here> <be> <contents> ...
Of course, all the elements in the document inherit the 'blah' namespace without ever using it.
I take it this will disable the optimizations to which you refer?
Can you be specific regarding what these optimizations are, or point me to documentation? Or more generally, can you offer any advice for how I should (a) detect related issues, and (b) deal with them?
Scrubbing the data on the way in may be an option. Should I be considering that?
Thanks, Wendell
Hi Wendell,
Can you be specific regarding what these optimizations are, or point me to documentation? Or more generally, can you offer any advice for how I should (a) detect related issues, and (b) deal with them?
I’m sorry we can’t offer more detailed information on this; as our query optimizer is subject to frequent changes, we lack the time to document all details.
Scrubbing the data on the way in may be an option. Should I be considering that?
If there’s no need to preserve namespaces, I’m quite sure this will help. As you may have seen, namespaces can be stripped via BaseX by either setting STRIPNS to true or selecting the corresponding checkbox in the Parsing tab of the create dialog.
Best, Christian
Dear Christian,
On Wed, Feb 13, 2013 at 6:50 AM, Christian Grün christian.gruen@gmail.com wrote:
Can you be specific regarding what these optimizations are, or point me to documentation? Or more generally, can you offer any advice for how I should (a) detect related issues, and (b) deal with them?
I’m sorry we can’t offer more detailed information on this; as our query optimizer is subject to frequent changes, we lack the time to document all details.
Understood. I'm sure this situation will improve as the product matures over time. (And in the meantime we can keep listening for hints. :-)
Scrubbing the data on the way in may be an option. Should I be considering that?
If there’s no need to preserve namespaces, I’m quite sure this will help. As you may have seen, namespaces can be stripped via BaseX by either setting STRIPNS to true or selecting the corresponding checkbox in the Parsing tab of the create dialog.
Thanks, we are now doing this.
Cheers, Wendell
basex-talk@mailman.uni-konstanz.de