Hi Christian, Christophe and all.
Since 2013 we are developing a middleware to the parallel XQuery processing in huge XML data. Today, we are evaluating it with BaseX in a cluster. For example, in standalone mode we have queries that do not execute in a desktop platform (4Gb RAM and -Xmx 2Gb). These queries were executed with approximately 20 hours in only one cluster processing node (16Gb RAM and -Xmx 10Gb) - final result has ~2 GB.
In our preliminary experiments, the query processing time was reduced in almost 80% with our middleware (scenario with 8 nodes, -Xmx 2Gb). We used XMark benchmark database with 1.0 GB, but further we will try with real databases with 5GB or more. In all cases, we focus in ad-hoc high-cost queries (with joins, aggregate functions etc.) and we did not mind with the the JVM behavior.
Shortly, I think that you need adopt a partitioning strategy (we recommend virtually instead of physically) and distribute the processing overhead. Sure, if you have a distributed environment available and may to treat the JVM and DBMSX how a black-box.
Kind regards,