I'm currently at work and my setup is at home. In about 7 hours I'll get home and I will send the stack trace.

Meanwhile, is there any way to write a FLWOR, a loop, in a batched style?

Like for example in my case, this approach I described to migrate data from BaseX to PostgreSQL makes use of BaseX as an XQuery processor and transfers the full-text indexing to PostgreSQL, this is what I'm trying to do.

However, in order to avoid OOM, I am thinking of batching the transfer into chunks, and potentially restart the BaseX server in between the migration of each chunk.

That's why I am asking how I could do that in BaseX. My hope is that the OOM could be avoided in this way, because not all the data would pass through main memory and there would be less chances of the JVM GC having to deal with this data.

Restarting the BaseX server between each chunk transfer would help making sure that whatever memory was used is released.

So I wonder if something like

(<insert-big-FLWOR-here)[position() = <start> to <end>]

would work here. Of course, some count would have to be done beforehand to know how many batches there will be. Or maybe even without knowing how many batches there will be, a while-type loop could be written in Bash with the stop conditon being to check if the current chunk is empty.

Would an approach like this work to mitigate the OOM? Are there alternatives or work-arounds to this kind of OOM?

Thanks

On Mon, Oct 7, 2019, 1:13 AM Christian Grün <christian.gruen@gmail.com> wrote:

Some complementary notes (others may be able to tell you more about their experiences with large data sets):

a GiST index would have to be built there, to allow full-text searches; PostgreSQL is picked

You could as well have a look at Elasticsearch or its predecessors.

there might be a leak in the BaseX implementation of XQuery.

I assume you are referring to the SQL Module? Feel free to attach the OOM stack trace, it might give us more insight.

I would recommend you to write SQL commands or an SQL dump to disk (see the BaseX File Module for now information) and run/import this file in a second step; this is probably faster than sending hundreds of thousands of single SQL commands via JDBC, no matter if you are using XQuery or Java.