Hi Christian
Thank you for directing me to the profiling module, I think that is just what I need
Cheers
Peter
---- Original Message ---- From: christian.gruen@gmail.com To: pw@themail.co.uk Subject: Re: [basex-talk] best way to partition large data sets among collections Date: Mon, 14 Jan 2013 19:13:12 +0100
Hi Peter,
Do you have any information to guide me here; what sorts of XQuery expressions should I match with large numbers of collections, and which with small numbers of collections?
hmm, there is no answer I get in mind that could give you general guidance here, as XQuery provides just too many possibilities for writing slow and fast queries. Its similar to the question how to write efficient Java, Perl of whatever.. To do get more information
on
why your queries are not as fast as they ideally should be, you
could
e.g...
check in the InfoView/query info if the relevant index structures
are used
use functions of the Profiling Module [1] to track down
bottlenecks
use -Xmx get do low level profiling
Next, you may pass us on snippets of your code that may, in your opinion, be subject to being optimized. Christian
[1] http://docs.basex.org/wiki/Profiling_Module
If I am using data-rich XML there is a high ration of nodes to content. What are the rules of thumb for this type of content?
Are there any recommendations specifically for GML?
Many thanks
Peter
---- Original Message ---- From: christian.gruen@gmail.com To: pw@themail.co.uk Subject: Re: [basex-talk] best way to partition large data sets
among
collections Date: Mon, 14 Jan 2013 13:05:00 +0100
Hi Peter,
thanks for the link. Theres no general answer for your question,
as
an application may both run flawlessly with a single or hundreds
of
databases, depending on how your XQuery expressions look like. If
you
do regular updates, I suggest to split your data into fixed
instances
that will never change, and use all indexes, and updating
instances
that may eventually be merged with the fixed instances if no more changes are expected.
Christian ___________________________
On Sun, Jan 13, 2013 at 1:09 AM, pw@themail.co.uk wrote:
Hello List
I am experimenting with statistical data ( http://www.semantechs.co.uk/ ) and found that organising 2.5Gb
of
xml
data into 12 unevenly sized collections ranging from 40 to
400Mb
performs much more slowly than 36 collections each containing approximately 75Mb of data.
What rules of thumb are there to guide me in designing the most performant database?
Many thanks
Peter
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de