Hi Rupert,
nice to hear that! Btw, I've added a little optimizations for mixed location paths (as the one containing the name() function after the location steps), which should perform the discussed optimizations on-the-fly:
http://files.basex.org/releases/latest/
It's recommended to switch to that latest snapshot anyway, as it fixes some important index rewritings on collections, which have been temporarily removed in Version 6.6.
More feedback is welcome, Christian ____________________________
Hello Christian!
You're my personal hero! The query "dinstinct-values(/descendant::*[randnr]/name())" worked perfectly! 2 GB of data analysed in 3 seconds, WOW!
Greetings and thanks again! Rupert Jung
-- Rupert Jung <pagina> GmbH Gesamtherstellung wissenschaftlicher Werke Herrenberger Str. 51 D-72070 Tübingen Handelsregister Stuttgart HRB 380249 Geschäftsführer: Tobias Ott
E-Mail: rupert.jung@pagina-tuebingen.de Phone: (0 70 71) 98 76-37 Fax: (0 70 71) 98 76-22
-----Ursprüngliche Nachricht----- Von: Christian Grün [mailto:christian.gruen@gmail.com] Gesendet: Dienstag, 29. März 2011 18:51 An: rupert.jung@pagina-tuebingen.de Cc: Andreas Weiler; bjoern.duenckel@pagina-tuebingen.de; basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] Out of memory
Rupert,
thanks for your observation. My assumption is that the (hidden) descendant-or-self step in your query causes a huge number of intermediary nodes, which are then reduced to a small result set. In other words, your query..
dinstinct-values(//*[randnr]/name())
..equals the following query:
dinstinct-values(/descendant-or-self::node()/child::*[child::randnr]/name())
There are several choices how to possibly speed up your query; please try e.g. to:
- explicitly use the descendant step:
dinstinct-values(/descendant::*[randnr]/name()) 2. wrap the name function around the location path: dinstinct-values( name( //*[randnr] )) 3. directly address the randnr nodes and use a parent step: dinstinct-values( name( /descendant::randnr/.. ))
We might add some optimizations to BaseX to automatize some of the proposed steps.
If this doesn't help, feel free to give us more feedback.
Best, Christian ___________________________
On Tue, Mar 29, 2011 at 5:31 PM, Rupert jung rupert.jung@pagina-tuebingen.de wrote:
Hi Andreas and thanks for your answer,
unfortunely that didn’t work (java.exe consumed 1.2 GB, then stopped). The expected result should not be longer then about 10 element names or so...
Maybe this could be a bug in basex itself…?
Rupert Jung
<pagina> GmbH Gesamtherstellung wissenschaftlicher Werke Herrenberger Str. 51 D-72070 Tübingen
Handelsregister Stuttgart HRB 380249 Geschäftsführer: Tobias Ott
Phone: (0 70 71) 98 76-37 Fax: (0 70 71) 98 76-22 E-Mail: rupert.jung@pagina-tuebingen.de http://www.pagina-online.de
Von: Andreas Weiler [mailto:andreas.weiler@uni-konstanz.de] Gesendet: Dienstag, 29. März 2011 16:25 An: rupert.jung@pagina-tuebingen.de Cc: basex-talk@mailman.uni-konstanz.de;
bjoern.duenckel@pagina-tuebingen.de
Betreff: Re: [basex-talk] Out of memory
Hi,
as first hint you could start BaseX with the Xmx flag of Java:
java -cp BaseX.jar -Xmx1G org.basex.BaseXGUI
Probably that will solve this issue.
Kind regards,
Andreas
Am 29.03.2011 um 16:14 schrieb Rupert jung:
Hi there,
I’m currently doing some tests with BaseX and a mid-sized database (around
2
GB).
I wonder myself why I’m not able to process this xquery-statement:
dinstinct-values(//*[randnr]/name())
(„Give me a list of all elements which have a child-element <randnr> and remove all double entries“)
After about 10 seconds a got a „out of main memory“ error. What’s really strange about this: Processing the nodes itself with //*[randnr]/
works like a charm (but gives me a HUGE amount of text and is not really useful for me at all).
My system: win7-x64, 4 GB RAM, Java 1.6.0_21
Thank you in advance,
Rupert Jung
Rupert Jung
<pagina> GmbH Gesamtherstellung wissenschaftlicher Werke Herrenberger Str. 51 D-72070 Tübingen
Handelsregister Stuttgart HRB 380249 Geschäftsführer: Tobias Ott
Phone: (0 70 71) 98 76-37 Fax: (0 70 71) 98 76-22 E-Mail: rupert.jung@pagina-tuebingen.de http://www.pagina-online.de
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hello Christian,
I'm deeply impressed... When do you do all this stuff? :)
basex-talk@mailman.uni-konstanz.de