Hi Thomas,
some years ago, we did experiments with nio that didn’t differ too much from conventional I/O, but we may have overseen issues, so your input is welcome. Note, however, that nio file channels are limited to 2GB (see e.g. [1]). As a consequence, some additional mappings will be needed if larger databases are to be opened and processed.
Christian
[1] http://stackoverflow.com/questions/8076472/filechannel-map-integer-max-value... ___________________________
On Thu, Dec 13, 2012 at 6:37 PM, Thomas Kaltofen thomas.kaltofen@risc.uni-linz.ac.at wrote:
Hi Christian,
I performed some command-level profiling based on your suggestion and I found out that the time during the long delay is spent in "java.io.RandomAccessFile.readBytes()". So I searched on the Internet and found several sources saying that java.io.RandomAccessFile has a poor performance on Windows with a disk using the NTFS file system (which is exactly what I have on all computers I used for testing). The solution people suggest to overcome the performance problem is to use FileChannels from java.nio (several new classes were added in this package in Java 7 which offer more possibilities for efficient file access). So I rewrote the two BaseX classes TableDiskAccess and DataAccess in a Java 6 compatible way to use FileChannels for reading and writing and the first tests look promising. I will continue my tests tomorrow and if these changes solve my problem I can send you the modified classes for testing in your environment. Thank you for your help so far!
Best regards, Thomas
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Thursday, December 13, 2012 2:57 AM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
as Andreas indicated, it looks as if the hard disks need to re-adjust to your query patterns after longer breaks; after all, I doubt that this is something that could be "fixed" within BaseX. Instead, it may help to have a second look; maybe they can be optimized to reduce I/O?
Do you have any recommendations for tools to profile the database on
Windows?
I usually avoid visual tools and use command-level profiling instead, i.e. via the flag -Xrunhprof:cpu=sample.
Hope this helps, Christian
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, December 11, 2012 7:58 PM To: Thomas Kaltofen Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Performance Question
Hi Thomas,
P.S. When (approx.) do you plan to release the next version of BaseX?
It's only a few days left! As a little hint, I can already disclose that the release will be nicknamed »BaseXMas Edition«.
I have a question regarding performance, because my database shows a
somehow strange behavior. [...] I leave the server running for several
hours
without touching the database at all (e.g. over the night). If I now execute the same query again (the database is still running but was idle for several hours), the execution takes very long (several minutes) [...]
Usually, I would have guessed that some other processes have been keeping your main memory while BaseX was not used, but I was surprised to hear that you also encountered the behavior on another computer. Did you already do some profiling in order to see what all the time is spent for (I/O, CPU, idle)?
- Execute a different query -> same behavior
As there are a lot of queries with a lot of different execution times, could you give us a guess what type of queries cause the behavior? I guess that a simple main-memory query (e.g. " (1 to 10000000)[. = 0] ") won't show the same effect, will it?
Best, Christian