Hi,
I've been doing some profiling recently and noticed one area in particular that appears to be using a lot of cycles. In calls to DiskData.write() the final DataOutput.close() call results in 20% - 40% of overall execution time in a variety of cases. The following screen capture shows the aggregated call stack (though it might look a little funny since this is the cross-compiled version used for Nxdb). Admittedly, this is probably an issue that just impacts me. Based on the trace, I suspect the issue is related to the releasing of handles rather than the actual disk writes (though I could be wrong - I'm not entirely sure what the cross-compiler does behind the scenes to support Java file descriptors).
In any case, my question boils down to wondering if there is a way to force the DiskData class to use a persistent DataOutput instance instead of creating and closing a new one for every write? Instead, could a DataOutput be kept open until the database is "unpinned" and simply flushed on each write? Would this cause any problems with the current GUI, server, etc. uses?
Thanks,
Dave
Hi Dave,
First, thanks for the analysis!
Am Mittwoch, 14. März 2012, 19:09:52 schrieb Dave Glick:
In any case, my question boils down to wondering if there is a way to force the DiskData class to use a persistent DataOutput instance instead of creating and closing a new one for every write? Instead, could a DataOutput be kept open until the database is "unpinned" and simply flushed on each write? Would this cause any problems with the current GUI, server, etc. uses?
I think you are right, and it would be, of course, much more efficient not to open/close the DataOutput on each call of write(). I don't see any problem of keeping the file with the meta data open during the whole life-cycle of a DiskData instance (I may be however wrong, so Christian should say his opinion). The problem which I see is how to make that, because the class org.basex.io.out.DataOutput does not have a method to reset the file offset to the beginning of the file. Therefore, either such a method should be added, or the meta-data should be written using a RandomAccessFile (e.g. org.basex.io.random.DiskAccess).
Regards, Dimitar
Hi Dave,
thanks for your analysis - profound as usual. You may get much better results by setting the AUTOFLUSH option to false [1]. Please tell me if you've done that already..
Christian
[1] http://docs.basex.org/wiki/Options#AUTOFLUSH ___________________________
On Wed, Mar 14, 2012 at 8:09 PM, Dave Glick dglick@dracorp.com wrote:
Hi,
I've been doing some profiling recently and noticed one area in particular that appears to be using a lot of cycles. In calls to DiskData.write() the final DataOutput.close() call results in 20% - 40% of overall execution time in a variety of cases. The following screen capture shows the aggregated call stack (though it might look a little funny since this is the cross-compiled version used for Nxdb). Admittedly, this is probably an issue that just impacts me. Based on the trace, I suspect the issue is related to the releasing of handles rather than the actual disk writes (though I could be wrong - I'm not entirely sure what the cross-compiler does behind the scenes to support Java file descriptors).
In any case, my question boils down to wondering if there is a way to force the DiskData class to use a persistent DataOutput instance instead of creating and closing a new one for every write? Instead, could a DataOutput be kept open until the database is "unpinned" and simply flushed on each write? Would this cause any problems with the current GUI, server, etc. uses?
Thanks,
Dave
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Christian,
I had played around with the AUTOFLUSH option and did notice some improvement, which is great for my own uses when appropriate. If that option is set to false, how often does the database actually get flushed? Only on close? Does it flush before index rebuilds from Optimize?
With the OutputStream question, I was more interested in systematic sources of delay that could potentially be mitigated (thinking about other users of Nxdb). This seemed like an instance where there might be room for improvement - however, I'm not sure the problem even exists outside the cross-compiled case.
Thanks,
Dave
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Wednesday, March 14, 2012 5:03 PM To: Dave Glick Cc: BaseX Subject: Re: [basex-talk] Any way to persist OutputStream in DiskData?
Hi Dave,
thanks for your analysis - profound as usual. You may get much better results by setting the AUTOFLUSH option to false [1]. Please tell me if you've done that already..
Christian
[1] http://docs.basex.org/wiki/Options#AUTOFLUSH ___________________________
On Wed, Mar 14, 2012 at 8:09 PM, Dave Glick dglick@dracorp.com wrote:
Hi,
I've been doing some profiling recently and noticed one area in particular that appears to be using a lot of cycles. In calls to DiskData.write() the final DataOutput.close() call results in 20% - 40% of overall execution time in a variety of cases. The following screen capture shows the aggregated call stack (though it might look a little funny since this is the cross-compiled version used for Nxdb). Admittedly, this is probably an issue that just impacts me. Based on the trace, I suspect the issue is related to the releasing of handles rather than the actual disk writes (though I could be wrong - I'm not entirely sure what the cross-compiler does behind the scenes to support Java file descriptors).
In any case, my question boils down to wondering if there is a way to force the DiskData class to use a persistent DataOutput instance instead of creating and closing a new one for every write? Instead, could a DataOutput be kept open until the database is "unpinned" and simply flushed on each write? Would this cause any problems with the current GUI, server, etc. uses?
Thanks,
Dave
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Dave,
I had played around with the AUTOFLUSH option and did notice some improvement, which is great for my own uses when appropriate. If that option is set to false, how often does the database actually get flushed? Only on close?
Yes. The number of flushes will be reduced as much as possible. If you want to force flushing, you'll either need to execute the FLUSH command or close the database.
With the OutputStream question, I was more interested in systematic sources of delay that could potentially be mitigated (thinking about other users of Nxdb). This seemed like an instance where there might be room for improvement - however, I'm not sure the problem even exists outside the cross-compiled case.
To be honest, I'm not sure how this would need to look like in the code.. Do you believe that the existing "autoflush" option will help in that case, or would you like to have it as a potential replacement?
Best, Christian
basex-talk@mailman.uni-konstanz.de