Wrong mail account...
-------- Weitergeleitete Nachricht -------- Betreff: Re: [basex-talk] How to process very long node sequences Datum: Wed, 18 Mar 2015 19:26:00 +0100 Von: Leo Wörteler lw@basex.org An: Hans-Juergen Rennau hrennau@yahoo.de Kopie (CC): basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de
Dear Hans-Jürgen,
Am 18.03.2015 um 15:23 schrieb Hans-Juergen Rennau:
I want to persuade the processor to visit, process and forget the nodes one after the other, rather than to attempt loading all nodes into memory before proceeding to process them.
BaseX already uses iterative processing as default mode. It only falls back to caching if it has to, e.g. for sorting, reversing or duplicate elimination. Calls to user-defined functions are currently also blocking, but we are currently investigating if that can be changed.
Schematically:
declare function f:processNode($node as node()) as empty-sequence() {...};
for $node in doc('huge-doc')/a/b/c return f:processNode($node)
The example you gave will always be evaluated without caching the node sequence. The `for` clause requests the result of its argument iteratively, and the XPath expression you used only contains child steps, which can be evaluated in document order without duplicates.
Well, in some cases it works, in others in doesn't. Is there any safe way how to enforce sequential visit-process-forget processing?
When the XPath expression becomes more complex, it is not as easy to predict if it uses caching internally. BaseX tries quite hard to detect paths that do not need it, the algorithm can be seen in [1]. If you see a `CachedPath` in the Info View of the GUI, you can try to reformulate the query.
I call this problem general because it is a critical aspect of dealing with huge documents.
XQuery does not have a dedicated *streaming mode* like that of XSLT 3.0 [2] (yet), but it would definitely be possible to check if some part of a query (e.g. marked by a pragma) is evaluated without caching. It would however be quite some work.
Hope that helps, Leo
[1] https://github.com/BaseXdb/basex/blob/0b828a8/basex-core/src/main/java/org/b... [2] http://www.w3.org/TR/xslt-30/#dt-guaranteed-streamable