[basex-talk] file:read-text-lines performance

15 Jan 2019


      Hello,
I'm trying to read a 4GB text file with 5 million lines and parse its 
contents. I'm using file:read-text-lines function 
http://docs.basex.org/wiki/File_Module#file:read-text-lines to do 
that. I managed to use fork-join and use 16 CPU threads to read the 
whole file by reading 10000 lines in each iteration, but it still takes 
500 seconds for parsing / analyzing the data. Using a profiler I can see 
that most of the time is wasted reading each line - method readline 
https://github.com/BaseXdb/basex/blob/0ef57de84659263c565ec41fff666ba5fa4f07dd/basex-core/src/main/java/org/basex/io/in/NewlineInput.java. 
I plan to make some changes on the code tonight and see if I can find a 
way to read it faster, but I thought I should also post it here in case 
you have any tips. I'm also very inexperienced with using profilers so I 
hope I read the output correctly :)
Regards,
George

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

[basex-talk] file:read-text-lines performance