Hi Mathias,
As you suggested I tried using the "new" command. I wasn't successfull so far, because I encountered a number of other problems during the process. Since overall the db creation process lasts several hours with these amounts of data the time till some of the errors/problems surfaced where equally long. (Invalid filenames or contents of some files)
True; you need to ensure that all XML documents are well-formed. You might as well use xmllint or similar tools to remove those files in advance.
Nevertheless today I got it running without OutOfMemoryExceptions or other printed errors. Unfortunately though, when I executed the "create db OAI [folder]" command in the BaseXClient (over ssh on my server) it obviously never finished.
That's quite an unusual behavior; I guess that too many URLs are resolved again and again, which might take lots of lots of time. I'd advise to set the intparse flag to true (set intparse on; create db ..., or Database -> New -> Parsing -> Use Internal Parser), or deactivate DTD parsing. If you need to do DTD handling, e.g. to resolve entities, you could as well specify a Catalog Resolver (http://docs.basex.org/wiki/Catalog_Resolver). Once more, I recommend to use the latest snapshot, this will simplify tracing down the cause of the problem.
Feel free to ask for more, Christian