Re: [basex-talk] basex OOM on 30GB database upon running /dba/db-optimize/

2 Oct 2019


      Hey Christian,
Thank you for your answer :)
I tried setting in .basex the SPLITSIZE = 24000 but I've seen the same OOM
behavior. It looks like the memory consumption is moderate until when it
reaches about 30GB (the size of the db before optimize) and
then memory consumption spikes, and OOM occurs. Now I'm trying with
SPLITSIZE = 1000 and will report back if I get OOM again.
Regarding what you said, it might be that the merge step is where the OOM
occurs (I wonder if there's any way to control how much memory is being
used inside the merge step).
To quote the statistics page from the wiki:
    Databases http://docs.basex.org/wiki/Databases in BaseX are
light-weight. If a database limit is reached, you can distribute your
documents across multiple database instances and access all of them with a
single XQuery expression.
This to me sounds like sharding. I would probably be able to split the
documents into chunks and upload them under a db with the same prefix, but
varying suffix.. seems a lot like shards. By doing this
I think I can avoid OOM, but if BaseX provides other, better, maybe native
mechanisms of avoiding OOM, I would try them.
Best regards,
Stefan
On Tue, Oct 1, 2019 at 4:22 PM Christian Grün christian.gruen@gmail.com
wrote:
...
Hi first name,
If you optimize your database, the indexes will be rebuilt. In this
step, the builder tries to guess how much free memory is still
available. If memory is exhausted, parts of the index will be split
(i. e., partially written to disk) and merged in a final step.
However, you can circumvent the heuristics by manually assigning a
static split value; see [1] for more information. If you use the DBA,
you’ll need to assign this value to your .basex or the web.xml file
[2]. In order to find the best value for your setup, it may be easier
to play around with the BaseX GUI.
As you have already seen in our statistics, an XML document has
various properties that may represent a limit for a single database.
Accordingly, these properties make it difficult to decide for the
system when the memory will be exhausted during an import or index
rebuild.
In general, you’ll get best performance (and your memory consumption
will be lower) if you create your database and specify the data to be
imported in a single run. This is currently not possible via the DBA;
use the GUI (Create Database) or console mode (CREATE DB command)
instead.
Hope this helps,
Christian
[1] http://docs.basex.org/wiki/Options#SPLITSIZE
[2] http://docs.basex.org/wiki/Configuration
On Mon, Sep 30, 2019 at 7:09 AM first name last name
randomcoder1@gmail.com wrote:
...
Hi,
Let's say there's a 30GB dataset [3] containing most threads/posts from
[1].
...
After importing all of it, when I try to run /dba/db-optimize/ on it
(which must have some corresponding command) I get the OOM error in the
stacktrace attached. I am using -Xmx2g so BaseX is limited to 2GB of memory
(the machine I'm running this on doesn't have a lot of memory).
...
I was looking at [2] for some estimates of peak memory usage for this
"db-optimize" operation, but couldn't find any.
...
Actually it would be nice to know peak memory usage because.. of course,
for any database (including BaseX) a common operation is to do server
sizing, to know what kind of server would be needed.
...
In this case, it seems like 2GB memory is enough to import 340k
documents, weighing in at 30GB total, but it's not enough to run
"dba-optimize".
...
Is there any info about peak memory usage on [2] ? And are there
guidelines for large-scale collection imports like I'm trying to do?
...
Thanks,
Stefan
[1] https://www.linuxquestions.org/
[2] http://docs.basex.org/wiki/Statistics
[3] https://drive.google.com/open?id=1lTEGA4JqlhVf1JsMQbloNGC-tfNkeQt2

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] basex OOM on 30GB database upon running /dba/db-optimize/