Hello dear BaseX Team,
I would like to load a big number of small documents in Basex. What I do is to use a little Java program:
Context ctx = new Context(); new CreateDB("basex1").execute(ctx); new CreateIndex("fulltext").execute(ctx); System.err.println("\n* Show database information:"); System.err.print(new InfoDB().execute(ctx));
for( ... ) { new Add(somedata, miname, collection).execute(ctx); }
I dont know if it is correct , efficient ? Do you have a notion of transaction, like begin/commit ? ( I did not see something like that)
My concern is that it seems not to behave well when the number of documents goes big: (each doc is about 2500 bytes)
10000 => 54 seconds ; 5.4 millis / document 20000 => 202 s ; 10 ms 50000 => 862 s ; 17.2 ms
And in fact I want to store more than 1 million!
Actually is there a limit in size ? Is 1 million x 2.5Kb OK?
Thank you very much Tomaso
Tomaso,
thanks for your mail. Your transaction will be much faster if you specify the input within the CreateDB command (..provided that your input documents are located on disk):
new CreateDB("basex1", "/path/to/docs").execute(ctx);
There is no factual limit in terms of the number of documents, but the document size and number of total XML nodes might represent a limit. Please have a look at http://docs.basex.org/wiki/Statistics for some databases that we've parsed with BaseX.
Christian ___________________________
On Mon, Jul 4, 2011 at 6:16 PM, Tomaso Musitelli tomasomusi@gmail.com wrote:
Hello dear BaseX Team, I would like to load a big number of small documents in Basex. What I do is to use a little Java program: Context ctx = new Context(); new CreateDB("basex1").execute(ctx); new CreateIndex("fulltext").execute(ctx); System.err.println("\n* Show database information:"); System.err.print(new InfoDB().execute(ctx));
for( ... ) { new Add(somedata, miname, collection).execute(ctx); }
I dont know if it is correct , efficient ? Do you have a notion of transaction, like begin/commit ? ( I did not see something like that) My concern is that it seems not to behave well when the number of documents goes big: (each doc is about 2500 bytes) 10000 => 54 seconds ; 5.4 millis / document 20000 => 202 s ; 10 ms 50000 => 862 s ; 17.2 ms And in fact I want to store more than 1 million! Actually is there a limit in size ? Is 1 million x 2.5Kb OK? Thank you very much Tomaso
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de