Hi Fabrice and list
I am dealing with data-centric XML rather than documents and so there is a fairly high node to content ratio. I have about 250 million nodes and I find that having about 15 million nodes per database seems to work well, but this is just a guesstimate and I am really looking for some performance profiles or some heuristics so that I can limit the numbers of nodes in each database before the performance degrades.
Cheers
Peter
---- Original Message ---- From: fetanchaud@questel.com To: pw@themail.co.uk, fetanchaud@questel.com, BaseX-Talk@mailman.uni-konstanz.de Subject: RE: [basex-talk] handling large files: is there a streamingsolution? Date: Tue, 12 Feb 2013 09:07:40 +0000
Dear Peter,
I'm just a BaseX user, and Christian's team will correct me, but
from my experience, document size does not matter, at least for querying.
Why do you talk about distributing data ? Did you reach the 2
billion nodes limit ?
As BaseX indexes all nodes, depending on the values distribution,
creating a new collection containing hand made indices can speed up your queries.
For example, for append only collections, I'm used to creating a
index collection like this :
<index> <item value='value to be indexed'> the 'pre' pointer to the indexed element </tem> <item>... </index>
And access that 'index' something like this :
for $i in //item[@value='searched value'] return db:open-pre('mydb', $i)
And a big number of documents may slow down the properties window
display in the GUI, because of the document tree view.
Question to the BaseX 's team : would 'user defined' indices be a
interesting feature ?
Regards
-----Message d'origine----- De : pw@themail.co.uk [mailto:pw@themail.co.uk] Envoyé : lundi 11 février 2013 17:13 À : Fabrice Etanchaud; pw@themail.co.uk;
BaseX-Talk@mailman.uni-konstanz.de
Objet : RE: [basex-talk] handling large files: is there a
streamingsolution?
Thanks Fabrice, I am making good progress following your advice. Do
you have any heuristics for the best way to distribute data for performant searches and subsetting of data? Am I better having lots of small files or a few large files in a collection?
---- Original Message ---- From: fetanchaud@questel.com To: pw@themail.co.uk, BaseX-Talk@mailman.uni-konstanz.de Subject: RE: [basex-talk] handling large files: is there a streamingsolution? Date: Mon, 11 Feb 2013 14:38:54 +0000
Dear Peter,
Did you try to create a collection with the files (CREATE command)
?
You should start that way, I don't see the point in using file:
module for import.
I think that once in the database, file size does not matter
(until
you reach millions of file in the collection, and do a lot of
document
related operations (list, etc...))
-----Message d'origine----- De : basex-talk-bounces@mailman.uni-konstanz.de
[mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de pw@themail.co.uk
Envoyé : lundi 11 février 2013 15:33 À : BaseX-Talk@mailman.uni-konstanz.de Objet : [basex-talk] handling large files: is there a streaming
solution?
Hello List I am wanting to do a join with some large (3-400Mb) XML files and
would appreciate guidance on the optimal strategy.
At present these files are on the filesystem and not in a database
Is there any equivalent to the Zorba streaming xml:parse()?
Would loading the files into a database directly be the approach,
or
is it better to split them into smaller files?
Is the file: module a suitable route through which to import the
files?
Thanks for your help
Peter
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk