Dear Tomaso, dear Michael,
today, I had a closer look into the BaseX routines that are responsible for adding new documents to the database, and I tweaked the update code of our document index to avoid linear costs for adding single documents. You are invited to check out the latest stable snapshot [1] and give us your valuable feedback.
There are still some bottlenecks: - the default XML parser takes some time for initialization, which is particularly noticable for small documents. You'll get some performance boost by switching to the internal parser (Command: set intparse; see [2] for details). - as each BaseX database command is atomic, the data is flushed to disk after each update to avoid data loss. You may either specify a directory on disk to add multiple files at once, or choose to insert nodes instead of documents, which will give you better performance.
Hope this helps, Christian
[1] http://files.basex.org/releases/latest/basex-6.7.1-SNAPSHOT.jar [2] http://docs.basex.org/wiki/Parsers ___________________________
On Mon, Jul 4, 2011 at 8:22 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Tomaso,
From your answer I suppose there is something slower when Add() is called many times, and faster if we use CreateDb. Can you explain why the times increase? Is it because Add updates the index each time?
Exactly; as the ADD operation is an atomic operation, there is currently no way to define a batch operation (other than going more low level, and looking at the Add command [1]). It might be that some update operations could be delayed though; I've added this as an issue (feel free to include more details) [2].
Christian
[1] https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/core/cm... [2] https://github.com/BaseXdb/basex/issues/137
basex-talk@mailman.uni-konstanz.de