Hello,
I'm trying to read a 4GB text file with 5 million lines and parse its contents. I'm using file:read-text-lines function http://docs.basex.org/wiki/File_Module#file:read-text-lines to do that. I managed to use fork-join and use 16 CPU threads to read the whole file by reading 10000 lines in each iteration, but it still takes 500 seconds for parsing / analyzing the data. Using a profiler I can see that most of the time is wasted reading each line - method readline https://github.com/BaseXdb/basex/blob/0ef57de84659263c565ec41fff666ba5fa4f07dd/basex-core/src/main/java/org/basex/io/in/NewlineInput.java. I plan to make some changes on the code tonight and see if I can find a way to read it faster, but I thought I should also post it here in case you have any tips. I'm also very inexperienced with using profilers so I hope I read the output correctly :)
Regards,
George