[basex-talk] Help - Regarding Performance Improvement

28 Dec 2023


      Hi,
Reaching out to get suggestions on improving performance.
Using basex to store and analyze around 350,000 to 500,000 XMLs.
Size of each XML varies between a few KBs to 5MB. Each day around 10k XMLs
get added/patched.
I have the following queries
1) What is the optimal size or number of documents in a DB? Earlier I had 1
DB with different collections but inserts were too slow, took more than 30s
just to replace a document. So split it up by some category to have around
30 DBs. Inserts are fine but again if there are too many documents in a
category, patching that DB slows and querying across all DBs also gets
slowed down. Any optimal number for DBs? Can I create many DBs like 1 for
every 10K XMLs? I read through
https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg06310.htm...,
having 100s of DBs cause query performance degradation? Is there any better
solution?
2) Query performance has degraded with more documents in a DB. I also
noticed that with/without token/attribute index, there is not much
difference to query performance (they are just XML attribute queries).
"Optimize" flag after inserts to recreate the index takes too much time and
memory. I am not running it now since I didn't find significant improvement
with/without index with my tests. Any suggestions for improving this?
3) Is it possible to just run queries against specific XMLs? I will have a
pre-filter based on user selection and queries need to be run against only
those XMLs. There are a number of filters users can apply and every time it
can result in a different set of XMLs against which analysis has to be
performed (Hence not feasible to create so many collections). Right now, I
am querying against all XMLs even though I am interested only in a subset
of XMLs and doing post filtering. I did go through
https://mailman.uni-konstanz.de/pipermail/basex-talk/2010-July/000495.html,
but again having a regex to include all the interested file paths(sometimes
entire set of documents) will slow it down.
Thank you,
Deepak

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

[basex-talk] Help - Regarding Performance Improvement