Hi kgfhjjgrn,
I believe that Fabrice already mentioned all details that should help
you to build larger databases. The ADDCACHE option [1] (included in
the latest stable snapshot [2]) may already be sufficient to add your
documents via the GUI: simply run the "set addcache true" command via
the input bar of the main window before opening the Properties dialog.
Note that you can access multiple databases with a single XQuery call,
so if you know that you’ll exceed the limits of a single database at
some time (see [3]), simply create new databases in certain intervals.
Hope this helps,
Christian
[1]
http://docs.basex.org/wiki/Options#ADDCACHE[2]
http://files.basex.org/releases/latest/[3]
http://docs.basex.org/wiki/Statistics_________________________________________
> The size of your test should not cause any problem to BaseX (18 000 files
> from 1 up to 5 KB)
>
>
>
> 1. Did you try to set the ADDCACHE option ?
>
> 2. You should OPTIMIZE your collection after each batch of ADD
> commands, even if no index is set.
>
> 3. Did you try to unset the AUTOFLUSH option, and explicitly FLUSH the
> updates at batch’s end ?
>
> 4. The GUI may not be the best place to run updates, did you try the
> basex command line tools ?
>
>
>
> Opening a collection containing a
huge number of documents may take a long
> time from my experience.
>
> It seems to be related to the kind of memory data structure used to store
> the document names.
>
> A workaround could be to insert your documents under a common root xml
> element with XQuery Update.
>
>
>
>
>
>
> Best,
>
> Fabrice Etanchaud
>
> Questel-Orbit
>
>
>
>
>
> De :
basex-talk-bounces@mailman.uni-konstanz.de> [mailto:
basex-talk-bounces@mailman.uni-konstanz.de] De la part de freesoft
> Envoyé : lundi 15 avril 2013 10:19
> À :
basex-talk@mailman.uni-konstanz.de> Objet : [basex-talk] Adding millions of XML files
>
>
>
> Hi, I'm new to BaseX and to XQuery. I already knew XPath. I'm evaluating
> BaseX to store our XML files and make queries on them. We have to store
> about 1 million of XML files per month. The XML files are little (~1 KB to 5
> KB). So our case is: High number of files, little size.
>
> I've read that BaseX is scalable and has high performance, so it is probably
> a good tool for us. But, in the tests I'm doing, I'm getting an "Out of Main
> Memory" error when loading high number of XML files.
>
> For exaple, if I create a new database ("testdb"), and add 3 XML files, no
> problem occurs. Files are stored correctly, and I can make queries on them.
> Then, if I
try to add 18000 XML files to the same database ("testdb") (by
> using GUI > Database > Properties > Add Resources), then I see how the
> coloured memory bar grows and grows... until an error appears:
>
> Out of Main Memory.
> You can try to:
> - increase Java's heap size with the flag -Xmx<size>
> - deactivate the text and attribute indexes.
>
> The text and attribute indexes are disabled, so it is not the cause. And I
> increased the Java size with the flag -Xmx<size> (by editing the
> basexgui.bat script), and same error happens.
>
> Probaly BaseX loads all files to main memory first, and then dumps them to
> the database files. That shouldn't be done in that way. For each XML file,
> it should be loaded into main memory, then procesed and then dumped to the
> db files. For each
file, independently from the rest.
>
> So I have two questions:
> 1. Do I have to use an special way to add high number of XML files?
> 2. Is BaseX sufficiently stable to store and manage our data (about 1
> million of files will be added per month)?
>
> Thank you for our help and for your great software, and excuse me if I am
> asking for recurrent questions.
>
>
> _______________________________________________
> BaseX-Talk mailing list
>
BaseX-Talk@mailman.uni-konstanz.de>
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk>