If I have a set of related documents (a text, a lexicon, frequency counts, a discourse analysis, etc), how should I decide when to put more than one document in a single database at different paths, as opposed to putting one document in each database?
When I create a database from the GUI, it seems to prefer one document per database. Should I take that as a hint?
Jonathan
Hi Jonathan,
In the Create Database dialog of the GUI, you can also specify a local directory as input; it will be recursively parsed.
In our own set of use cases, we have one database instance with appr. 8 million documents, but in most cases, the number is much smaller. There are some sample statistics in our documentation [1], they show you what is possible.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Statistics
Am 06.02.2018 11:31 nachm. schrieb "Jonathan Robie" < jonathan.robie@gmail.com>:
If I have a set of related documents (a text, a lexicon, frequency counts, a discourse analysis, etc), how should I decide when to put more than one document in a single database at different paths, as opposed to putting one document in each database?
When I create a database from the GUI, it seems to prefer one document per database. Should I take that as a hint?
Jonathan
Jonathan, in my humble opinion, here are the main reasons you may need several collections :
- FullText indexing in several languages (because language is collection wide) : a per language partition of your data
- Size (usually in number of nodes) limitation
- Huge updates : a read only backlog collection + a read/write front collection of fresh data + queries tailored to read both collections.
Best regards, And maybe good night ?
Fabrice Etanchaud
De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Jonathan Robie Envoyé : mardi 6 février 2018 23:31 À : BaseX Objet : [basex-talk] One document per database or multiple?
If I have a set of related documents (a text, a lexicon, frequency counts, a discourse analysis, etc), how should I decide when to put more than one document in a single database at different paths, as opposed to putting one document in each database?
When I create a database from the GUI, it seems to prefer one document per database. Should I take that as a hint?
Jonathan
Thanks to both of you, Fabrice and Christian. I should have asked this question a long time ago - it will help a lot with organizing my work better.
Jonathan
On Tue, Feb 6, 2018 at 5:41 PM, Fabrice ETANCHAUD < fetanchaud@pch.cerfrance.fr> wrote:
Jonathan, in my humble opinion, here are the main reasons you may need several collections :
FullText indexing in several languages (because language is
collection wide) : a per language partition of your data
Size (usually in number of nodes) limitation
Huge updates : a read only backlog collection + a read/write
front collection of fresh data + queries tailored to read both collections.
Best regards,
And maybe good night ?
Fabrice Etanchaud
*De :* basex-talk-bounces@mailman.uni-konstanz.de [mailto: basex-talk-bounces@mailman.uni-konstanz.de] *De la part de* Jonathan Robie *Envoyé :* mardi 6 février 2018 23:31 *À :* BaseX *Objet :* [basex-talk] One document per database or multiple?
If I have a set of related documents (a text, a lexicon, frequency counts, a discourse analysis, etc), how should I decide when to put more than one document in a single database at different paths, as opposed to putting one document in each database?
When I create a database from the GUI, it seems to prefer one document per database. Should I take that as a hint?
Jonathan
basex-talk@mailman.uni-konstanz.de