Rupert,
thanks for your observation. My assumption is that the (hidden) descendant-or-self step in your query causes a huge number of intermediary nodes, which are then reduced to a small result set. In other words, your query..
dinstinct-values(//*[randnr]/name())
..equals the following query:
dinstinct-values(/descendant-or-self::node()/child::*[child::randnr]/name())
There are several choices how to possibly speed up your query; please try e.g. to:
1. explicitly use the descendant step: dinstinct-values(/descendant::*[randnr]/name()) 2. wrap the name function around the location path: dinstinct-values( name( //*[randnr] )) 3. directly address the randnr nodes and use a parent step: dinstinct-values( name( /descendant::randnr/.. ))
We might add some optimizations to BaseX to automatize some of the proposed steps.
If this doesn't help, feel free to give us more feedback.
Best, Christian ___________________________
On Tue, Mar 29, 2011 at 5:31 PM, Rupert jung rupert.jung@pagina-tuebingen.de wrote:
Hi Andreas and thanks for your answer,
unfortunely that didn’t work (java.exe consumed 1.2 GB, then stopped). The expected result should not be longer then about 10 element names or so...
Maybe this could be a bug in basex itself…?
Rupert Jung
<pagina> GmbH Gesamtherstellung wissenschaftlicher Werke Herrenberger Str. 51 D-72070 Tübingen
Handelsregister Stuttgart HRB 380249 Geschäftsführer: Tobias Ott
Phone: (0 70 71) 98 76-37 Fax: (0 70 71) 98 76-22 E-Mail: rupert.jung@pagina-tuebingen.de http://www.pagina-online.de
Von: Andreas Weiler [mailto:andreas.weiler@uni-konstanz.de] Gesendet: Dienstag, 29. März 2011 16:25 An: rupert.jung@pagina-tuebingen.de Cc: basex-talk@mailman.uni-konstanz.de; bjoern.duenckel@pagina-tuebingen.de Betreff: Re: [basex-talk] Out of memory
Hi,
as first hint you could start BaseX with the Xmx flag of Java:
java -cp BaseX.jar -Xmx1G org.basex.BaseXGUI
Probably that will solve this issue.
Kind regards,
Andreas
Am 29.03.2011 um 16:14 schrieb Rupert jung:
Hi there,
I’m currently doing some tests with BaseX and a mid-sized database (around 2 GB).
I wonder myself why I’m not able to process this xquery-statement:
dinstinct-values(//*[randnr]/name())
(„Give me a list of all elements which have a child-element <randnr> and remove all double entries“)
After about 10 seconds a got a „out of main memory“ error. What’s really strange about this: Processing the nodes itself with //*[randnr]/
works like a charm (but gives me a HUGE amount of text and is not really useful for me at all).
My system: win7-x64, 4 GB RAM, Java 1.6.0_21
Thank you in advance,
Rupert Jung
Rupert Jung
<pagina> GmbH Gesamtherstellung wissenschaftlicher Werke Herrenberger Str. 51 D-72070 Tübingen
Handelsregister Stuttgart HRB 380249 Geschäftsführer: Tobias Ott
Phone: (0 70 71) 98 76-37 Fax: (0 70 71) 98 76-22 E-Mail: rupert.jung@pagina-tuebingen.de http://www.pagina-online.de
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hello Christian!
You're my personal hero! The query "dinstinct-values(/descendant::*[randnr]/name())" worked perfectly! 2 GB of data analysed in 3 seconds, WOW!
Greetings and thanks again! Rupert Jung
basex-talk@mailman.uni-konstanz.de