Thank you for the suggestion. I'm trying it now. Here's how I'm going about it:
cfbearden@quirkstation:~/projects/Influuent$ basex -d BaseX 8.3 [Standalone] Try help to get more information.
set addcache true
ADDCACHE: true
set ftindex true
FTINDEX: true
create db pure_20151019 pure_20151019
Creating Database... ..;..;..;..;..;..;.;..;..
Where 'pure_20151019' is both the name of the database and the subdirectory where all my XML files are.
It could well be that I'm missing a crucial option; I'm still relatively new to BaseX. It's great stuff, though.
Because of my employer's IT environment, I have to run my Linux workstation in a VMWare VM, though I doubt that that makes a difference.
Thanks, Chuck
On Tue, Oct 20, 2015 at 11:15 AM, Christian GrĂ¼n christian.gruen@gmail.com wrote:
Hi Chuck,
Usually, 4G is more than enough to create a full-text index for 16G of XML. Obviously, however, that's not the case for your input data. You could try to distribute your documents in multiple database. As as alternative, we could have a look at your data and try to find out what's going wrong. You can also use the -d flag and send us the stack trace.
Best, Christian
On Tue, Oct 20, 2015 at 4:19 PM, Chuck Bearden cfbearden@gmail.com wrote:
Hi all,
I have about 16G of XML data in about 52000 files, and I was hoping to build a full-text index over it. I've tried two approaches: enable full-text indexing as I create the database and then loading the data, and creating the full-text index after loading the data. If I enable ADDCACHE and modify the basex shell script to use 4g of RAM instead of 512M, I have no problem loading the data. If I try to load with FTINDEX or create the index afterward, the process runs out of memory.
I could believe that I'm overlooking some option that would make this possible, but I suspect I just have too much data. I welcome your thoughts & suggestions.
All the best, Chuck Bearden