In the discussion of adding metadata to a bunch of files Christian points out that you can both limit queries to directories within a single database or apply a query to multiple databases.
My question: when or why would you prefer one approach over the other?
In my case I'm using BaseX to reflect the XML contents of git repositories. My current approach is to create a separate database for each repo/branch pair, my reasoning being that that makes it easiest to limit queries to just that branch. Because the BaseX data is intended to be a read-only reflecting of the git-managed source, it also makes it easy to clear the data for a branch if it's gotten out of sync (or I suspect it's gotten out of sync) by simply dropping the database.
I have complete control over the queries (through a library of functions that understand the git nature of the databases), so I could just as easily use a single database with subdirectories that reflect the repos and branches.
In this scenario, as an example, is there any compelling reason to use one approach or the other?
I like having one database per branch because that seems like a natural mapping that generally keeps things simple and more or less obvious (e.g., doing "list" will show the list of databases, which reflect the repo and branch names in their names).
In this application the scale will usually be relatively small: 1000s or 10s of 1000s of individual documents in any given branch but the querying and indexing, which supports maintaining knowledge of the links within the XML content, could get intense.
Cheers,
Eliot
————— Eliot Kimber, Owner Contrext, LLC http://contrext.com