Hi Christian,
Even when I leave only the first filter and test it as standalone it takes more than 8 seconds:
Result:
- Hit(s): 250000 Items
- Updated: 0 Items
- Printed: 2048 KB
- Read Locking: local [CDI]
- Write Locking: none
Timing:
- Parsing: 2.0 ms
- Compiling: 107.74 ms
- Evaluating: 8085.55 ms
- Printing: 106.4 ms
- Total Time: 8301.69 ms
With kind regards,
Menashè
On 06/22/2015 07:57 PM, Christian Grün wrote:
Hi Menashè,

 QUERY[0] xquery version "3.0"; declare namespace queryName ='GetIDS';
declare namespace gco = "http://www.isotc211.org/2005/gco"; declare
[...]
It would be great if you could help us and simplify the query, such
that we can have a look at the core issue.

Id there an undocumented way to log the full xquery in BaseX server logs?
The maximum size of log entries can be adjusted via the option LOGMSGMAXLEN [1].

Cheers,
Christian

[1] http://docs.basex.org/wiki/Options#LOGMSGMAXLEN



I've seen the -V option, but I don't use the standalone version, but:
java -cp /usr/share/java/basex.jar org.basex.BaseXServer
-d doesn't give me extra query info.


With kind regards,
Menashè

On 02/03/2015 01:13 PM, Menashè Eliezer wrote:
Hi Christian,

Thank you! The performance arrives to 0.5 sec!

The biggest improvement is related to the query rephrasing you've
suggested.
Then the latest snapshot also helps a lot!
You may want to know that in the log of the latest snapshot I see
applying attribute index for "7827"
which is not clear to the user, instead of BaseX80-20150130.124009 which
has also used indexing:
applying attribute index for ("ALKY", "AYMD")

I'm attaching the first and the second launch of the query using
BaseXGUI. Relaunching the same query reduces the time from over 1 second to
0.5 second.
Some data:
BaseX80-20150130.124009
Total Time: 30676.02 ms
After using "for $x in
collection("ALL-CDIS")/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification":
Total Time: 5456.74 ms
applying attribute index for ("ALKY", "AYMD") in log.
Second launch: 1333.71 ms
Latest snapshot (BaseX80-20150202.121033):
1st: Total Time: 1873.02 ms
2nd: Total Time: 548.62 ms

With kind regards,
Menashè

On 02/02/2015 02:02 PM, Menashè Eliezer wrote:
Hi Christian,

Thank you very much! Unfortunately I'll be at the office only tomorrow.

Menashè

On Sat, 31 Jan 2015 16:42:32 +0100, Christian Grün
<christian.gruen@gmail.com> wrote:
Hi Menashè,

With the latest snapshot [1], your original query should now be
rewritten for index access as well. Looking forward to your tests,

Christian

PS: In terms of performance, it may still be worthwhile to move
redundant paths to the for clause; but just try and see.

[1] http://files.basex.org/releases/latest/



On Fri, Jan 30, 2015 at 9:49 PM, Christian Grün
<christian.gruen@gmail.com> wrote:
Hi Menashè,

Should I expect to see the usage of an index for each of the where
phrases?
Usually, only one predicate will be rewritten for index access, and
the remaining conditions will be answered sequentially.

Have a nice weekend!
Enjoy,
Christian


Menashè

On Fri, 30 Jan 2015 18:11:59 +0100, Christian Grün
<christian.gruen@gmail.com> wrote:
Hi Menashè,

Thanks for the XML samples you sent me in private. I noticed that
the
index rewritings will only be triggered if you formulate your query
as
follows:

OLD:
   for $x in collection("ALL-CDIS")
   where $x/gmd:MD_Metadata/gmd:identificationInfo/...
   return ...

NEW:
   for $x in collection("ALL-CDIS")/gmd:MD_Metadata
   where $x/gmd:identificationInfo/...
   return ...

It's difficult to explain in short sentences why Variant 1 cannot be
optimized that straightforward (basically, it's quite a different
pattern to look for), but I'll check out if we can extend our
matcher
to also support these kind of queries.

So, if possible, I would recommend you for now (and at least for
testing) to move the root element test after the collection()
function. I noticed that the first three child steps are the same in
all of your conditions:

gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification

If that will be always be the case, it surely makes sense to move
all
of them to the "for" clause.

Looking forward to your updated performance tests,
Christian
_______________________________

On Fri, Jan 30, 2015 at 5:55 PM, Christian Grün
<christian.gruen@gmail.com> wrote:
Could you possibly provide me with a small snapshot of your data
sources (one, two documents might be sufficient)?


On Fri, Jan 30, 2015 at 5:52 PM, Menashè Eliezer
<meliezer@ogs.trieste.it> wrote:
Almost the same speed with version 8.0.
No indexing (no "applying" in the query info).
As I've attached before, indexes are active for this DB.

With kind regards,
Menashè


On 01/30/2015 05:31 PM, Christian Grün wrote:
It's indeed interesting that your query does not use any of the
existing index structures (if they did, you would find strings
like
"applying text index" or "applying attribute index" in the query
info). Maybe/hopefully things look different with Version 8.0.


On Fri, Jan 30, 2015 at 5:26 PM, Menashè Eliezer
<meliezer@ogs.trieste.it> wrote:
On 01/30/2015 05:18 PM, Christian Grün wrote:



/gmd:MD_Metadata/gmd:identificationInfo/sdn:SDN_DataIdentification/gmd:descriptiveKeywords[1]/gmd:MD_Keywords/gmd:keyword[2]/sdn:SDN_ParameterDiscoveryCode/@codeListValue
How can I remove *?
Simply remove the predicate; a[*]/b is the same as a/b.
Maybe I wasn't clear. The actual number appears in the xml file,
e.g.,
gmd:descriptiveKeywords[1]
Anyway, I've removed all [*] and I get the same correct result,
however
the
processing time is doubled...

* In some cases, if you know that an element name is
distinct,
you
can
get rid of all the explicit child steps and directly address
the
node
via the descendant axis.
Thanks, but it's not relevant in my case.
Is it because the element names are not distinct? Or is it
because
your input form allows users to choose arbitrary paths for
arbitrary
documents?
The element names are not distinct.

Sure, I'l also try BaseX 8.0 and compare. Should I recreate
the
db
importing
the xml files for testing the improved indexing?
We have actually improved support for collections, but the
database
format itself has not changed, so it shouldn't make a
difference
in
your case.

Christian


[1] http://files.basex.org/releases/latest



On Fri, Jan 30, 2015 at 3:55 PM, Menashè Eliezer
<meliezer@ogs.trieste.it> wrote:
Hello,
I wonder if the attached query can be optimised. I'm
attaching
all
relevant
information.
Basex 7.9, Debian, powerful server.
This is just an example. The queries will be built based on
a
compilation
of
a search form.
Any help would be appreciated.
40 seconds are not acceptable.

--
With kind regards,
Menashè

--
With kind regards,
Menashè


With kind regards,
Menashè

--
Menashè