Some complementary notes (others may be able to tell you more about their experiences with large data sets):
a GiST index would have to be built there, to allow full-text searches;
PostgreSQL is picked
You could as well have a look at Elasticsearch or its predecessors.
there might be a leak in the BaseX implementation of XQuery.
I assume you are referring to the SQL Module? Feel free to attach the OOM stack trace, it might give us more insight.
I would recommend you to write SQL commands or an SQL dump to disk (see the BaseX File Module for now information) and run/import this file in a second step; this is probably faster than sending hundreds of thousands of single SQL commands via JDBC, no matter if you are using XQuery or Java.