... and thanks for the background on XQuery semantics! 

The tweak was quite simple. The collection() function is simply emulated in this script, where the db: prefix is bound to a private namespace :

declare function db:collection($folder) {
  for $file in file:list($folder)
  return fetch:xml($folder || $file)
};

Now memory stays below a couple of GB fluctuating according to folder size.

Cheers,
Lars

2015-03-27 10:58 GMT+01:00 Lars Johnsen <yoonsen@gmail.com>:
Hi Christian, and thanks a lot for the pointer to fetch:xml - it seems to do the trick! Now, a little recoding, and it should be working.

Best, 
Lars

2015-03-27 10:48 GMT+01:00 Christian Grün <christian.gruen@gmail.com>:
Hi Lars,

Here is some background information for the reported behavior (sorry
in advance if this is known to you anyway): The functional semantics
of XQuery requires that repeated calls to fn:doc and fn:collection
return the same documents. This can e.g. be shown by the following
query:

  doc('x.xml') is doc('x.xml')

As it's difficult to guess in advance which of the opened documents
will possibly be requested again in the same query, they are all kept
in main-memory until query evaluation is completed.

However, things are different with functions like fetch:xml [1]. You
may need to tweak your query a little bit, because the function will
always give you single XML documents.

Does this help?
Christian

[1] http://docs.basex.org/wiki/Fetch_Module#fetch:xml


On Fri, Mar 27, 2015 at 10:41 AM, Lars Johnsen <yoonsen@gmail.com> wrote:
> Hi all
>
> Here is code that gradually eats up memory, whether run in GUI or as
> command. All it does is creating temporary collections out of folders, and
> writing them to file.
>
> Is there a simple way to avoid this code to eat up memory? It runs out of
> memory (set at 12GB for command, 18GB in GUI) after 300 folders or so, and
> it has to process 20 000 of them.
>
> Best
> Lars G Johnsen
> Norwegian National Library
>
> Here is the actual code
>
> (: process list of folders :)
>
> for $collections in file:list($digibooks)
>      let $html := $htmlfiles || substring-before($collections, "_ocr") ||
> ".html"
>
>      return
>
>         (: code is rerun so check if files exist :)
>
>         if (not(file:exists($html))) then
>         try {
>
>            (: create a temporary collection of the files and write result to
> disk :)
>
>             file:write(
>               $html,
>               db:digibok-to-html(
>                 collection($digibooks || $collections))
>             )
>
>         } catch * {
>             $err:code
>         }
>         else
>           ()
>
>