Dear BaseX team
I am planning an update on our previous custom indexing system [1]. But to do this I have a couple of questions. The major ones will be how to write an efficient custom indexing query in XQuery, but that'll be for another email. (In fact, we have a dual indexing system, so two index files per main file.) For now I am mainly interested in different documents in a single databases, and the doc() functionality.
Intuitively, I'd say that documents that are related to each other should be put in the same database. E.g. one database with different documents for plants, and one database with different documents for animals. But when I was scrolling through the documentation of BaseX, I noticed that when creating custom indices you do not put those in the same db as the original content, so you have on database for the content and one for the index [2]. Is this the way it's typically done?
More generally, the questions that I have are the following:
* What is the actual difference in BaseX between using separate documents in a single database, or using different databases all together?
* Is there a performance difference when I would put my index file in the same database as the content, vs. when using different databases altogether?
* What is the max allowed size for a document in a database and a database itself respectively? (I have files that are 100's of GB in size. It might not be plausible to have a file and its index file in the same database.)
Thank you in advance Kind regards
Bram Vanroy Doctoral Research at Ghent University, Belgium https://www.lt3.ugent.be/people/bram-vanroy/
[1] https://biblio.ugent.be/publication/8534144 [2] http://docs.basex.org/wiki/Indexes#Custom_Index_Structures