Hi Christian
Do you have any information to guide me here; what sorts of XQuery expressions should I match with large numbers of collections, and which with small numbers of collections?
If I am using data-rich XML there is a high ration of nodes to content. What are the rules of thumb for this type of content?
Are there any recommendations specifically for GML?
Many thanks
Peter
---- Original Message ---- From: christian.gruen@gmail.com To: pw@themail.co.uk Subject: Re: [basex-talk] best way to partition large data sets among collections Date: Mon, 14 Jan 2013 13:05:00 +0100
Hi Peter,
thanks for the link. Theres no general answer for your question, as an application may both run flawlessly with a single or hundreds of databases, depending on how your XQuery expressions look like. If
you
do regular updates, I suggest to split your data into fixed
instances
that will never change, and use all indexes, and updating instances that may eventually be merged with the fixed instances if no more changes are expected.
Christian ___________________________
On Sun, Jan 13, 2013 at 1:09 AM, pw@themail.co.uk wrote:
Hello List
I am experimenting with statistical data ( http://www.semantechs.co.uk/ ) and found that organising 2.5Gb of
xml
data into 12 unevenly sized collections ranging from 40 to 400Mb performs much more slowly than 36 collections each containing approximately 75Mb of data.
What rules of thumb are there to guide me in designing the most performant database?
Many thanks
Peter
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Peter,
Do you have any information to guide me here; what sorts of XQuery expressions should I match with large numbers of collections, and which with small numbers of collections?
hmm, there is no answer I get in mind that could give you general guidance here, as XQuery provides just too many possibilities for writing slow and fast queries. It’s similar to the question how to write efficient Java, Perl of whatever.. To do get more information on why your queries are not as fast as they ideally should be, you could e.g...
– check in the InfoView/query info if the relevant index structures are used – use functions of the Profiling Module [1] to track down bottlenecks – use -Xmx get do low level profiling
Next, you may pass us on snippets of your code that may, in your opinion, be subject to being optimized. Christian
[1] http://docs.basex.org/wiki/Profiling_Module
If I am using data-rich XML there is a high ration of nodes to content. What are the rules of thumb for this type of content?
Are there any recommendations specifically for GML?
Many thanks
Peter
---- Original Message ---- From: christian.gruen@gmail.com To: pw@themail.co.uk Subject: Re: [basex-talk] best way to partition large data sets among collections Date: Mon, 14 Jan 2013 13:05:00 +0100
Hi Peter,
thanks for the link. There’s no general answer for your question, as an application may both run flawlessly with a single or hundreds of databases, depending on how your XQuery expressions look like. If
you
do regular updates, I suggest to split your data into fixed
instances
that will never change, and use all indexes, and updating instances that may eventually be merged with the fixed instances if no more changes are expected.
Christian ___________________________
On Sun, Jan 13, 2013 at 1:09 AM, pw@themail.co.uk wrote:
Hello List
I am experimenting with statistical data ( http://www.semantechs.co.uk/ ) and found that organising 2.5Gb of
xml
data into 12 unevenly sized collections ranging from 40 to 400Mb performs much more slowly than 36 collections each containing approximately 75Mb of data.
What rules of thumb are there to guide me in designing the most performant database?
Many thanks
Peter
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de