Hi Bram,
Thanks a lot for sharing your results!
Looking forward to reading your final publication.

Best regards,
Erdal

2016-10-01 14:42 GMT+02:00 Bram Vanroy | KU Leuven <bram.vanroy1@student.kuleuven.be>:

Hi Erdal

 

Depending on the data and the actual size you might be interested in this article. [1] I guess the title is self-explanatory: ‘Making a large treebank searchable online’. Instead of using a huge database, the authors (supervisors of mine) chose to distinguish a lot of small databases. This is useful because before starting your query you can already prune and only go through data that you actually need. The benchmarks that we ran (not published yet) show that the bigger your dataset, the higher the performance gain.

 

To give you an idea: when running +- 90 queries on a corpus of 15 million sentences (in treebank form, i.e. with dependency structures) the median overall query time was 2675 seconds in the regular version of the corpus, and merely 123s in the re-organised database structure. Note that these results are not published yet, so please do not quote me from this email and wait for the publication next year.

 

I hope it helps, or gives you some new ideas!

 

 

Kind regards

 

Bram Vanroy

http://bramvanroy.be/

 

[1]: http://www.lrec-conf.org/proceedings/lrec2014/workshops/LREC2014Workshop-CMLC2%20Proceedings-rev2.pdf#page=20

 

 

 

 

Van: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] Namens Erdal Karaca
Verzonden: zaterdag 1 oktober 2016 13:01
Aan: basex-talk <basex-talk@mailman.uni-konstanz.de>
Onderwerp: [basex-talk] One database instance per user

 

Hi,

I intend to use one database for each user of the business domain (web application).

Does anyone have experiences with lots of 'small' databases vs. one 'big' database regarding performance/scalability/stability?

 

Thanks!

 

Best regards,

Erdal