Hi Menashè,
With the latest snapshot [1], your original query should now be rewritten for index access as well. Looking forward to your tests,
Christian
PS: In terms of performance, it may still be worthwhile to move redundant paths to the for clause; but just try and see.
[1] http://files.basex.org/releases/latest/
On Fri, Jan 30, 2015 at 9:49 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Menashè,
Should I expect to see the usage of an index for each of the where phrases?
Usually, only one predicate will be rewritten for index access, and the remaining conditions will be answered sequentially.
Have a nice weekend!
Enjoy, Christian
Menashè
On Fri, 30 Jan 2015 18:11:59 +0100, Christian Grün christian.gruen@gmail.com wrote:
Hi Menashè,
Thanks for the XML samples you sent me in private. I noticed that the index rewritings will only be triggered if you formulate your query as follows:
OLD: for $x in collection("ALL-CDIS") where $x/gmd:MD_Metadata/gmd:identificationInfo/... return ...
NEW: for $x in collection("ALL-CDIS")/gmd:MD_Metadata where $x/gmd:identificationInfo/... return ...
It's difficult to explain in short sentences why Variant 1 cannot be optimized that straightforward (basically, it's quite a different pattern to look for), but I'll check out if we can extend our matcher to also support these kind of queries.
So, if possible, I would recommend you for now (and at least for testing) to move the root element test after the collection() function. I noticed that the first three child steps are the same in all of your conditions:
gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification
If that will be always be the case, it surely makes sense to move all of them to the "for" clause.
Looking forward to your updated performance tests, Christian _______________________________
On Fri, Jan 30, 2015 at 5:55 PM, Christian Grün christian.gruen@gmail.com wrote:
Could you possibly provide me with a small snapshot of your data sources (one, two documents might be sufficient)?
On Fri, Jan 30, 2015 at 5:52 PM, Menashè Eliezer meliezer@ogs.trieste.it wrote:
Almost the same speed with version 8.0. No indexing (no "applying" in the query info). As I've attached before, indexes are active for this DB.
With kind regards, Menashè
On 01/30/2015 05:31 PM, Christian Grün wrote:
It's indeed interesting that your query does not use any of the existing index structures (if they did, you would find strings like "applying text index" or "applying attribute index" in the query info). Maybe/hopefully things look different with Version 8.0.
On Fri, Jan 30, 2015 at 5:26 PM, Menashè Eliezer meliezer@ogs.trieste.it wrote: > > On 01/30/2015 05:18 PM, Christian Grün wrote: >> >> >> >>
/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:descriptiveKeywords[1]/gmd:MD_Keywords/gmd:keyword[2]/sdn:SDN_ParameterDiscoveryCode/@codeListValue
>>> >>> How can I remove *? >> >> Simply remove the predicate; a[*]/b is the same as a/b. > > Maybe I wasn't clear. The actual number appears in the xml file,
e.g.,
> gmd:descriptiveKeywords[1] > Anyway, I've removed all [*] and I get the same correct result,
however
> the > processing time is doubled... >> >> >>>> * In some cases, if you know that an element name is distinct, you
can
>>>> get rid of all the explicit child steps and directly address the
node
>>>> via the descendant axis. >>> >>> Thanks, but it's not relevant in my case. >> >> Is it because the element names are not distinct? Or is it because >> your input form allows users to choose arbitrary paths for arbitrary >> documents? > > The element names are not distinct. > >>> Sure, I'l also try BaseX 8.0 and compare. Should I recreate the db >>> importing >>> the xml files for testing the improved indexing? >> >> We have actually improved support for collections, but the database >> format itself has not changed, so it shouldn't make a difference in >> your case. >> >> Christian >> >> >>>> [1] http://files.basex.org/releases/latest >>>> >>>> >>>> >>>> On Fri, Jan 30, 2015 at 3:55 PM, Menashè Eliezer >>>> meliezer@ogs.trieste.it wrote: >>>>> >>>>> Hello, >>>>> I wonder if the attached query can be optimised. I'm attaching
all
>>>>> relevant >>>>> information. >>>>> Basex 7.9, Debian, powerful server. >>>>> This is just an example. The queries will be built based on a >>>>> compilation >>>>> of >>>>> a search form. >>>>> Any help would be appreciated. >>>>> 40 seconds are not acceptable. >>>>> >>>>> -- >>>>> With kind regards, >>>>> Menashè >>>>> >>> -- >>> With kind regards, >>> Menashè >>> >>> > With kind regards, > Menashè >
-- Menashè