Hi Christian, Even when I leave only the first filter and test it as standalone it takes more than 8 seconds:
Result: - Hit(s): 250000 Items - Updated: 0 Items - Printed: 2048 KB - Read Locking: local [CDI] - Write Locking: none Timing: - Parsing: 2.0 ms - Compiling: 107.74 ms - Evaluating: 8085.55 ms - Printing: 106.4 ms - Total Time: 8301.69 ms
With kind regards, Menashè
On 06/22/2015 07:57 PM, Christian Grün wrote:
Hi Menashè,
QUERY[0] xquery version "3.0"; declare namespace queryName ='GetIDS'; declare namespace gco = "http://www.isotc211.org/2005/gco"; declare [...]
It would be great if you could help us and simplify the query, such that we can have a look at the core issue.
Id there an undocumented way to log the full xquery in BaseX server logs?
The maximum size of log entries can be adjusted via the option LOGMSGMAXLEN [1].
Cheers, Christian
[1] http://docs.basex.org/wiki/Options#LOGMSGMAXLEN
I've seen the -V option, but I don't use the standalone version, but: java -cp /usr/share/java/basex.jar org.basex.BaseXServer -d doesn't give me extra query info.
With kind regards, Menashè
On 02/03/2015 01:13 PM, Menashè Eliezer wrote:
Hi Christian,
Thank you! The performance arrives to 0.5 sec!
The biggest improvement is related to the query rephrasing you've suggested. Then the latest snapshot also helps a lot! You may want to know that in the log of the latest snapshot I see applying attribute index for "7827" which is not clear to the user, instead of BaseX80-20150130.124009 which has also used indexing: applying attribute index for ("ALKY", "AYMD")
I'm attaching the first and the second launch of the query using BaseXGUI. Relaunching the same query reduces the time from over 1 second to 0.5 second. Some data: BaseX80-20150130.124009 Total Time: 30676.02 ms After using "for $x in collection("ALL-CDIS")/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification": Total Time: 5456.74 ms applying attribute index for ("ALKY", "AYMD") in log. Second launch: 1333.71 ms Latest snapshot (BaseX80-20150202.121033): 1st: Total Time: 1873.02 ms 2nd: Total Time: 548.62 ms
With kind regards, Menashè
On 02/02/2015 02:02 PM, Menashè Eliezer wrote:
Hi Christian,
Thank you very much! Unfortunately I'll be at the office only tomorrow.
Menashè
On Sat, 31 Jan 2015 16:42:32 +0100, Christian Grün christian.gruen@gmail.com wrote:
Hi Menashè,
With the latest snapshot [1], your original query should now be rewritten for index access as well. Looking forward to your tests,
Christian
PS: In terms of performance, it may still be worthwhile to move redundant paths to the for clause; but just try and see.
[1] http://files.basex.org/releases/latest/
On Fri, Jan 30, 2015 at 9:49 PM, Christian Grün christian.gruen@gmail.com wrote: > Hi Menashè, > >> Should I expect to see the usage of an index for each of the where phrases? > Usually, only one predicate will be rewritten for index access, and > the remaining conditions will be answered sequentially. > >> Have a nice weekend! > Enjoy, > Christian > > >> Menashè >> >> On Fri, 30 Jan 2015 18:11:59 +0100, Christian Grün >> christian.gruen@gmail.com wrote: >>> Hi Menashè, >>> >>> Thanks for the XML samples you sent me in private. I noticed that >>> the >>> index rewritings will only be triggered if you formulate your query >>> as >>> follows: >>> >>> OLD: >>> for $x in collection("ALL-CDIS") >>> where $x/gmd:MD_Metadata/gmd:identificationInfo/... >>> return ... >>> >>> NEW: >>> for $x in collection("ALL-CDIS")/gmd:MD_Metadata >>> where $x/gmd:identificationInfo/... >>> return ... >>> >>> It's difficult to explain in short sentences why Variant 1 cannot be >>> optimized that straightforward (basically, it's quite a different >>> pattern to look for), but I'll check out if we can extend our >>> matcher >>> to also support these kind of queries. >>> >>> So, if possible, I would recommend you for now (and at least for >>> testing) to move the root element test after the collection() >>> function. I noticed that the first three child steps are the same in >>> all of your conditions: >>> >>> gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification >>> >>> If that will be always be the case, it surely makes sense to move >>> all >>> of them to the "for" clause. >>> >>> Looking forward to your updated performance tests, >>> Christian >>> _______________________________ >>> >>> On Fri, Jan 30, 2015 at 5:55 PM, Christian Grün >>> christian.gruen@gmail.com wrote: >>>> Could you possibly provide me with a small snapshot of your data >>>> sources (one, two documents might be sufficient)? >>>> >>>> >>>> On Fri, Jan 30, 2015 at 5:52 PM, Menashè Eliezer >>>> meliezer@ogs.trieste.it wrote: >>>>> Almost the same speed with version 8.0. >>>>> No indexing (no "applying" in the query info). >>>>> As I've attached before, indexes are active for this DB. >>>>> >>>>> With kind regards, >>>>> Menashè >>>>> >>>>> >>>>> On 01/30/2015 05:31 PM, Christian Grün wrote: >>>>>> It's indeed interesting that your query does not use any of the >>>>>> existing index structures (if they did, you would find strings >>>>>> like >>>>>> "applying text index" or "applying attribute index" in the query >>>>>> info). Maybe/hopefully things look different with Version 8.0. >>>>>> >>>>>> >>>>>> On Fri, Jan 30, 2015 at 5:26 PM, Menashè Eliezer >>>>>> meliezer@ogs.trieste.it wrote: >>>>>>> On 01/30/2015 05:18 PM, Christian Grün wrote: >>>>>>>> >>>>>>>> >>>>>>>>
/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:descriptiveKeywords[1]/gmd:MD_Keywords/gmd:keyword[2]/sdn:SDN_ParameterDiscoveryCode/@codeListValue
>>>>>>>>> How can I remove *? >>>>>>>> Simply remove the predicate; a[*]/b is the same as a/b. >>>>>>> Maybe I wasn't clear. The actual number appears in the xml file, >>> e.g., >>>>>>> gmd:descriptiveKeywords[1] >>>>>>> Anyway, I've removed all [*] and I get the same correct result, >>> however >>>>>>> the >>>>>>> processing time is doubled... >>>>>>>> >>>>>>>>>> * In some cases, if you know that an element name is >>>>>>>>>> distinct, you >>> can >>>>>>>>>> get rid of all the explicit child steps and directly address the >>> node >>>>>>>>>> via the descendant axis. >>>>>>>>> Thanks, but it's not relevant in my case. >>>>>>>> Is it because the element names are not distinct? Or is it because >>>>>>>> your input form allows users to choose arbitrary paths for arbitrary >>>>>>>> documents? >>>>>>> The element names are not distinct. >>>>>>> >>>>>>>>> Sure, I'l also try BaseX 8.0 and compare. Should I recreate >>>>>>>>> the db >>>>>>>>> importing >>>>>>>>> the xml files for testing the improved indexing? >>>>>>>> We have actually improved support for collections, but the database >>>>>>>> format itself has not changed, so it shouldn't make a >>>>>>>> difference in >>>>>>>> your case. >>>>>>>> >>>>>>>> Christian >>>>>>>> >>>>>>>> >>>>>>>>>> [1] http://files.basex.org/releases/latest >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jan 30, 2015 at 3:55 PM, Menashè Eliezer >>>>>>>>>> meliezer@ogs.trieste.it wrote: >>>>>>>>>>> Hello, >>>>>>>>>>> I wonder if the attached query can be optimised. I'm >>>>>>>>>>> attaching >>> all >>>>>>>>>>> relevant >>>>>>>>>>> information. >>>>>>>>>>> Basex 7.9, Debian, powerful server. >>>>>>>>>>> This is just an example. The queries will be built based on >>>>>>>>>>> a >>>>>>>>>>> compilation >>>>>>>>>>> of >>>>>>>>>>> a search form. >>>>>>>>>>> Any help would be appreciated. >>>>>>>>>>> 40 seconds are not acceptable. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> With kind regards, >>>>>>>>>>> Menashè >>>>>>>>>>> >>>>>>>>> -- >>>>>>>>> With kind regards, >>>>>>>>> Menashè >>>>>>>>> >>>>>>>>> >>>>>>> With kind regards, >>>>>>> Menashè >>>>>>> >> -- >> Menashè