Hi Bram, Thanks a lot for sharing your results! Looking forward to reading your final publication.
Best regards, Erdal
2016-10-01 14:42 GMT+02:00 Bram Vanroy | KU Leuven < bram.vanroy1@student.kuleuven.be>:
Hi Erdal
Depending on the data and the actual size you might be interested in this article. [1] I guess the title is self-explanatory: ‘Making a large treebank searchable online’. Instead of using a huge database, the authors (supervisors of mine) chose to distinguish a lot of small databases. This is useful because before starting your query you can already prune and only go through data that you actually need. The benchmarks that we ran (not published yet) show that the bigger your dataset, the higher the performance gain.
To give you an idea: when running +- 90 queries on a corpus of 15 million sentences (in treebank form, i.e. with dependency structures) the median overall query time was 2675 seconds in the regular version of the corpus, and merely 123s in the re-organised database structure. Note that these results are not published yet, so please do not quote me from this email and wait for the publication next year.
I hope it helps, or gives you some new ideas!
Kind regards
Bram Vanroy
workshops/LREC2014Workshop-CMLC2%20Proceedings-rev2.pdf#page=20
*Van:* basex-talk-bounces@mailman.uni-konstanz.de [mailto: basex-talk-bounces@mailman.uni-konstanz.de] *Namens *Erdal Karaca *Verzonden:* zaterdag 1 oktober 2016 13:01 *Aan:* basex-talk basex-talk@mailman.uni-konstanz.de *Onderwerp:* [basex-talk] One database instance per user
Hi,
I intend to use one database for each user of the business domain (web application).
Does anyone have experiences with lots of 'small' databases vs. one 'big' database regarding performance/scalability/stability?
Thanks!
Best regards,
Erdal