Re: [basex-talk] best way to partition large data sets among collections - BaseX-Talk - mailman.uni-konstanz.de

14 Jan 2013


      Hi Christian
Thank you for directing me to the profiling module, I think that is
just what I need
Cheers
Peter
...
---- Original Message ----
From: christian.gruen@gmail.com
To: pw@themail.co.uk
Subject: Re: [basex-talk] best way to partition large data sets among
collections
Date: Mon, 14 Jan 2013 19:13:12 +0100
...
Hi Peter,
...
Do you have any information to guide me here; what sorts of XQuery
expressions should I match with large numbers of collections, and
which with small numbers of collections?
hmm, there is no answer I get in mind that could give you general
guidance here, as XQuery provides just too many possibilities for
writing slow and fast queries. Its similar to the question how to
write efficient Java, Perl of whatever.. To do get more information
on
...
why your queries are not as fast as they ideally should be, you
could
...
e.g...
 check in the InfoView/query info if the relevant index structures
are used
...
 use functions of the Profiling Module [1] to track down
bottlenecks
...
 use -Xmx get do low level profiling
Next, you may pass us on snippets of your code that may, in your
opinion, be subject to being optimized.
Christian
[1] http://docs.basex.org/wiki/Profiling_Module
...
If I am using data-rich XML there is a high ration of nodes to
content.  What are the rules of thumb for this type of content?
Are there any recommendations specifically for GML?
Many thanks
Peter
...
---- Original Message ----
From: christian.gruen@gmail.com
To: pw@themail.co.uk
Subject: Re: [basex-talk] best way to partition large data sets
among
...
...
...
collections
Date: Mon, 14 Jan 2013 13:05:00 +0100
...
Hi Peter,
thanks for the link. Theres no general answer for your question,
as
...
...
...
...
an application may both run flawlessly with a single or hundreds
of
...
...
...
...
databases, depending on how your XQuery expressions look like. If
you
...
do regular updates, I suggest to split your data into fixed
instances
...
that will never change, and use all indexes, and updating
instances
...
...
...
...
that may eventually be merged with the fixed instances if no more
changes are expected.
Christian
___________________________
On Sun, Jan 13, 2013 at 1:09 AM,  pw@themail.co.uk wrote:
...
Hello List
I am experimenting with statistical data (
http://www.semantechs.co.uk/ ) and found that organising 2.5Gb
of
...
...
...
xml
...
...
data into 12 unevenly sized collections ranging from 40 to
400Mb
...
...
...
...
...
performs much more slowly than 36 collections each containing
approximately 75Mb of data.
What rules of thumb are there to guide me in designing the most
performant database?
Many thanks
Peter

BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk