Re: [basex-talk] handling large files: is there a streamingsolution?

12 Feb 2013


      Hi Fabrice and list
I am dealing with data-centric XML rather than documents and so there
is a fairly high node to content ratio.  I have about 250 million
nodes and I find that having about 15 million nodes per database
seems to work well, but this is just a guesstimate and I am really
looking for some performance profiles or some heuristics so that I
can limit the numbers of nodes in each database before the
performance degrades.
Cheers
Peter
...
---- Original Message ----
From: fetanchaud@questel.com
To: pw@themail.co.uk, fetanchaud@questel.com,
BaseX-Talk@mailman.uni-konstanz.de
Subject: RE: [basex-talk] handling large files: is there a
streamingsolution?
Date: Tue, 12 Feb 2013 09:07:40 +0000
...
Dear Peter,
I'm just a BaseX user, and Christian's team will correct me, but
from my experience, document size does not matter, at least for
querying.
...
Why do you talk about distributing data ? Did you reach the 2
billion nodes limit ?
...
As BaseX indexes all nodes, depending on the values distribution,
creating a new collection containing hand made indices can speed up
your queries.
...
For example, for append only collections, I'm used to creating a
index collection like this :
...
<index>
   <item value='value to be indexed'>
   	the 'pre' pointer to the indexed element
   </tem>
   <item>...
</index>
And access that 'index' something like this :
for $i in 
   //item[@value='searched value']
return
   db:open-pre('mydb', $i)
And a big number of documents may slow down the properties window
display in  the GUI, because of the document tree view.
...
Question to the BaseX 's team : would 'user defined' indices be a
interesting feature ?
...
Regards
-----Message d'origine-----
De : pw@themail.co.uk [mailto:pw@themail.co.uk] 
Envoyé : lundi 11 février 2013 17:13
À : Fabrice Etanchaud; pw@themail.co.uk;
BaseX-Talk@mailman.uni-konstanz.de
...
Objet : RE: [basex-talk] handling large files: is there a
streamingsolution?
...
Thanks Fabrice, I am making good progress following your advice.  Do
you have any heuristics for the best way to distribute data for
performant searches and subsetting of data?  Am I better having lots
of small files or a few large files in a collection?
...
...
---- Original Message ----
From: fetanchaud@questel.com
To: pw@themail.co.uk, BaseX-Talk@mailman.uni-konstanz.de
Subject: RE: [basex-talk] handling large files: is there a 
streamingsolution?
Date: Mon, 11 Feb 2013 14:38:54 +0000
...
Dear Peter,
Did you try to create a collection with the files (CREATE command)
?
...
...
...
You should start that way,  I don't see the point in using file:
module for import.
...
I think that once in the database, file size does not matter
(until
...
...
you reach millions of file in the collection, and do a lot of
document
...
...
related operations (list, etc...))
...
-----Message d'origine-----
De : basex-talk-bounces@mailman.uni-konstanz.de
[mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de 
pw@themail.co.uk
...
Envoyé : lundi 11 février 2013 15:33
À : BaseX-Talk@mailman.uni-konstanz.de
Objet : [basex-talk] handling large files: is there a streaming
solution?
...
Hello List
I am wanting to do a join with some large (3-400Mb) XML files and
would appreciate guidance on the optimal strategy.
...
At present these files are on the filesystem and not in a database
Is there any equivalent to the Zorba streaming xml:parse()?
Would loading the files into a database directly be the approach,
or
...
...
is it better to split them into smaller files?
...
Is the file: module a suitable route through which to import the
files?
...
Thanks for your help
Peter

BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] handling large files: is there a streamingsolution?