Re: [basex-talk] Separate Databases vs. Directories Within One Databse?

16 May 2015


      Christian,
That is helpful. Basically you've confirmed my initial analysis that
because BaseX databases are light weight that keeping things simple is the
most appropriate choice.
If I was doing things at scale of course I'd do performance testing to see
where the bottlenecks are, but that is not a concern for what I'm doing
now.
Cheers,
E.
—————
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com
On 5/16/15, 5:10 AM, "Christian Grün" christian.gruen@gmail.com wrote:
...
Hi Eliot,
As usual, there is no simple answer to such a question. However, I can
say that sounds like a good choice to use one BaseX database per git
repository. In contrast to many other dbms, databases in BaseX are
pretty light-weight containers, and in some of our own use cases we
even create one database per document.
If you have hundreds or thousands of databases, then it may be
reasonable to merge them into single units, because it may take too
much time to access the database directories in the file system. Some
file systems are better than others in handling large amounts of files
and directories on the same level. The same observation applies if you
frequently write queries that access more than one database: It's
always faster to open a single database (but usually you will only
notice this when opening a larger number of databases).
Hope this helps,
Christian
On Thu, May 14, 2015 at 3:57 PM, Eliot Kimber ekimber@contrext.com
wrote:
...
In the discussion of adding metadata to a bunch of files Christian
points
out that you can both limit queries to directories within a single
database or apply a query to multiple databases.
My question: when or why would you prefer one approach over the other?
In my case I'm using BaseX to reflect the XML contents of git
repositories. My current approach is to create a separate database for
each repo/branch pair, my reasoning being that that makes it easiest to
limit queries to just that branch. Because the BaseX data is intended to
be a read-only reflecting of the git-managed source, it also makes it
easy
to clear the data for a branch if it's gotten out of sync (or I suspect
it's gotten out of sync) by simply dropping the database.
I have complete control over the queries (through a library of functions
that understand the git nature of the databases), so I could just as
easily use a single database with subdirectories that reflect the repos
and branches.
In this scenario, as an example, is there any compelling reason to use
one
approach or the other?
I like having one database per branch because that seems like a natural
mapping that generally keeps things simple and more or less obvious
(e.g.,
doing "list" will show the list of databases, which reflect the repo and
branch names in their names).
In this application the scale will usually be relatively small: 1000s or
10s of 1000s of individual documents in any given branch but the
querying
and indexing, which supports maintaining knowledge of the links within
the
XML content, could get intense.
Cheers,
Eliot
—————
Eliot Kimber, Owner
Contrext, LLC
http://contrext.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Separate Databases vs. Directories Within One Databse?