There also looks to be a difference on how the read-text-lines is used. The following similar queries produce different Query paths, and have different memory usage. This is probably why I can't benefit from the update on more complex queries.
1) return count(file:read-text-lines($file, "UTF-8", false()))
Memory usage - about 20 megabytes
Query path:
<QueryPlan compiled="true" updating="false"> <FnCount name="count(items)" type="xs:integer" size="1"> <FileReadTextLines name="read-text-lines(path[,encoding[,fallback[,offset[,length]]]])" type="xs:string*"> <Str type="xs:string">/home/lumiel/eworx/betmechs/bme/webservice/samples/betfair/September-2015/output.json</Str> <Str type="xs:string">UTF-8</Str> <Bln type="xs:boolean">false</Bln> </FileReadTextLines> </FnCount> </QueryPlan>
2) let $data := file:read-text-lines($file, "UTF-8", false()) return count($data)
Memory usage: 4.5GB
Query path:
<QueryPlan compiled="true" updating="false"> <GFLWOR type="xs:integer" size="1"> <Let type="xs:string*"> <Var name="$data" id="1" type="xs:string*"/> <FileReadTextLines name="read-text-lines(path[,encoding[,fallback[,offset[,length]]]])" type="xs:string*"> <Str type="xs:string">/full/path/file.txt</Str> <Str type="xs:string">UTF-8</Str> <Bln type="xs:boolean">false</Bln> </FileReadTextLines> </Let> <FnCount name="count(items)" type="xs:integer" size="1"> <VarRef type="xs:string*"> <Var name="$data" id="1" type="xs:string*"/> </VarRef> </FnCount> </GFLWOR> </QueryPlan>
On 1/15/19 1:48 PM, Christian Grün wrote:
Hi George,
I’m glad to announce that files are now processed in an iterative manner [1,2]. That’s something I wanted to try a while ago, and your mail was another motivation to get it done.
It works pretty fine: I reduced the JVM memory to a tiny maximum of 4mb, and I managed to count the line numbers of a file with several gigabytes:
count(file:read-text-lines('huge.txt'))
I’d be interested to hear if your code runs faster with the latest snapshot. Christian
[1] http://files.basex.org/releases/latest/ [2] https://github.com/BaseXdb/basex/commit/cfb7a7965de85139ec9595a6e79a45d873da...