Hi,
I am evaluting BaseX for processing large size xml files(in GBs). I tried initially with XML file size of 1.1 GB to add in BaseX.
it took 101 Sec.
But when I have splitted it into 10K small instances and tried to add it using "ADD source_dir/" through command line, It took around 2.5 hours to add those files into the DB.
System configuration is as below. Linux machine having 8 cores and 64 GB RAM
I don't have any clue why there is such a big difference between two scenarios ?
if any one can give any clue, will be helpful to me.
Thanks & Regards,
Hi Kunal,
finally some feedback.
In general, it is faster in BaseX to process single large documents. I am wondering about the big difference, though; do your XML snippets include dtd references, or are all files (appr. 100.000, I guess) located in the same directory, or have you put them into multiple sub-directories?
You could do some profiling by adding -Xrunhprof:cpu=sample as JVM argument (with less files, in order to save time) and send me the resulting profiling file.
Best, Christian
I am evaluting BaseX for processing large size xml files(in GBs). I tried initially with XML file size of 1.1 GB to add in BaseX.
it took 101 Sec.
But when I have splitted it into 10K small instances and tried to add it using "ADD source_dir/" through command line, It took around 2.5 hours to add those files into the DB.
System configuration is as below. Linux machine having 8 cores and 64 GB RAM
I don't have any clue why there is such a big difference between two scenarios ?
if any one can give any clue, will be helpful to me.
Thanks & Regards,
Kunal Chauhan mail4ck@gmail.com [+918655517141]
basex-talk@mailman.uni-konstanz.de