Hi Leo,
Thank you. It seems to be the sorting:
Here a snippet from the basex.log:
18:30:33.617 [127.0.0.1:43408] QUERY(0) for $item in doc('skk-gemaelde')/items/item/kuenstler order by $item/@order return $item/text() OK 18:30:33.617 [127.0.0.1:43408] QUERY(0) OK 18:30:33.666 [127.0.0.1:43408] INIT(0) OK 18:31:17.468 [127.0.0.1:43408] CLOSE(0) OK
Between INIT and CLOSE it takes 44 seconds? The query now is simple! Obviously we have to tune our basex-installation. I have to wait until my administrator returns from holiday...
Thanks, Christof
-----Ursprüngliche Nachricht----- Von: Leo Wörteler [mailto:lw@basex.org] Gesendet: Mittwoch, 24. August 2011 15:56 An: Mainberger, Christof Cc: Winter, Carina; basex-talk Betreff: Re: [basex-talk] XQuery / BaseX: distinct after sort
Dear Christof,
Am 24.08.2011 14:46, schrieb Mainberger, Christof:
distinct-values(for $item in doc('gemaelde')/items/item order by $item/kuenstler/@order return $item/kuenstler).
One first remark:
The XQuery Spec [1] says about fn:distinct-values():
The order in which the sequence of values is returned is ·implementation dependent·.
In BaseX this means that the results come out in arbitrary order, so the sorting has no effect at all.
It takes about 26 s for 1.500 items.
This really sounds inacceptable... Could you try executing the query
(a) without the sorting, just filtering unique kuenstler items (b) without distinct-values()
and report the timings? This would help pinning down the cause of the slowdown. For (b) the number of results would also be of interest, because it shows how much work distinct-values() has to do.
Is there any possibility to formulate the query in other ways to make this faster? Or indexes/materialized views to support such kind of queries?
BaseX doesn't have built-in on-the-fly indices (yet), but you could definitely roll your own using the XQuery Maps extension [2]. I'd be happy to assist you if you don't feel comfortable with that.
If distinct-values() has to remove many duplicates, this process should better happen before sorting the list, as this would drastically reduce the run time. But then you can't use distinct-values(), because it atomizes its argument items, thus removing the attribute to sort by. I'll try writing a non-atomizing variant in XQuery.
This problem occurs only on our Debian (virtual) server, not on my PC. Are there any configuration parameters for basex-server (ram etc.), which might help?
You could try starting the JVM with the flag -Xmx500m and thus giving it 500MiB of RAM (or any other big quantity available). But as the query completes instead of dying with an OutOfMemoryException, I'm not sure this is the solution. The slowdown could then only be caused by excessive garbage collection to free up memory.
Hope that helps, Cheers, Leo __________
[1] http://www.w3.org/TR/xpath-functions/#func-distinct-values [2] http://docs.basex.org/wiki/Map_Functions