Re: [basex-talk] Full-text index with lots of data

20 Oct 2015


      Thank you for the suggestion. I'm trying it now. Here's how I'm going about it:
cfbearden@quirkstation:~/projects/Influuent$ basex -d
BaseX 8.3 [Standalone]
Try help to get more information.
...
set addcache true
ADDCACHE: true
...
set ftindex true
FTINDEX: true
...
create db pure_20151019 pure_20151019
Creating Database...
..;..;..;..;..;..;.;..;..
Where 'pure_20151019' is both the name of the database and the
subdirectory where all my XML files are.
It could well be that I'm missing a crucial option; I'm still
relatively new to BaseX. It's great stuff, though.
Because of my employer's IT environment, I have to run my Linux
workstation in a VMWare VM, though I doubt that that makes a
difference.
Thanks,
Chuck
On Tue, Oct 20, 2015 at 11:15 AM, Christian Grün
christian.gruen@gmail.com wrote:
...
Hi Chuck,
Usually, 4G is more than enough to create a full-text index for 16G of
XML. Obviously, however, that's not the case for your input data. You
could try to distribute your documents in multiple database. As as
alternative, we could have a look at your data and try to find out
what's going wrong. You can also use the -d flag and send us the stack
trace.
Best,
Christian
On Tue, Oct 20, 2015 at 4:19 PM, Chuck Bearden cfbearden@gmail.com wrote:
...
Hi all,
I have about 16G of XML data in about 52000 files, and I was hoping to
build a full-text index over it. I've tried two approaches: enable
full-text indexing as I create the database and then loading the data,
and creating the full-text index after loading the data. If I enable
ADDCACHE and modify the basex shell script to use 4g of RAM instead of
512M, I have no problem loading the data. If I try to load with
FTINDEX or create the index afterward, the process runs out of memory.
I could believe that I'm overlooking some option that would make this
possible, but I suspect I just have too much data. I welcome your
thoughts & suggestions.
All the best,
Chuck Bearden

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Full-text index with lots of data