Dear all,
From what I read in the documentation,
My problem seems to be related to the update of the resource index.
Is this index updated after each add/replace/delete command, Or at the end of the commands' list ?
Last, could you please tell me if replace is equivalent to delete+add ?
Best, Fabrice
De : Fabrice Etanchaud Envoyé : vendredi 15 mars 2013 14:18 À : 'basex-talk@mailman.uni-konstanz.de' Objet : seeking for a document in a collection with a million documents is very slow
Dear all,
Does the document index rely on text or attribute index ? One can experiment very slow response time when looking for a specific document name in an optimized collection of a million documents (in dynamic execution, without optimizations, for example within a loop with a name being the concatenation of string values). Is there a way to speed things up ?
Best regards, Fabrice Etanchaud Questel-orbit
Hi Fabrice,
yes, the document index is updated with each updating command. If you perform numerous updates, you may get better performance by switching AUTOFLUSH off [1]. Another alternative to speed up multiple update operations is to use XQuery for updates. Due to the pending update list semantics, however, It will require more main memory.
Christian
[1] http://docs.basex.org/wiki/Options#AUTOFLUSH ___________________________
Dear all,
From what I read in the documentation, My problem seems to be related to the update of the resource index.
Is this index updated after each add/replace/delete command, Or at the end of the commands’ list ?
Last, could you please tell me if replace is equivalent to delete+add ?
Best, Fabrice
Thank you Christian.
It seems that the resource index is not persistent, But rebuilt in memory at the first index access after the collection is opened. For my 3 M documents collection, it takes about 250 secs to respond to the first db:open('mycoll','mydoc') query. Following queries responds in millisecs before the collection gets opened again.
From your experience,
What could be the good way to handle a collection of several millions documents, With about ten thousands inserted/updated documents once a week ?
Best regards, Fabrice
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 18 mars 2013 15:32 À : Fabrice Etanchaud Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] seeking for a document in a collection with a million documents is very slow
Hi Fabrice,
yes, the document index is updated with each updating command. If you perform numerous updates, you may get better performance by switching AUTOFLUSH off [1]. Another alternative to speed up multiple update operations is to use XQuery for updates. Due to the pending update list semantics, however, It will require more main memory.
Christian
[1] http://docs.basex.org/wiki/Options#AUTOFLUSH ___________________________
Dear all,
From what I read in the documentation, My problem seems to be related to the update of the resource index.
Is this index updated after each add/replace/delete command, Or at the end of the commands' list ?
Last, could you please tell me if replace is equivalent to delete+add ?
Best, Fabrice
Hi Fabrice,
From your experience, What could be the good way to handle a collection of several millions documents, With about ten thousands inserted/updated documents once a week ?
The article on the Twitter use case may give you some hints how updates can be sped up [1]. Apart from that, I would propose to do some profiling in order to find our which operations require most time or memory. Have you already tried the autoflush option? Do you use XQuery or the commands for your updates?
Regarding your last question..
Last, could you please tell me if replace is equivalent to delete+add ?
The operations should be quite comparable. If you know the names of all documents to be deleted in advance, you could first delete all commands in a db:delete loop and then add all new documents.
[1] http://docs.basex.org/wiki/Twitter
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 18 mars 2013 15:32 À : Fabrice Etanchaud Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] seeking for a document in a collection with a million documents is very slow
Hi Fabrice,
yes, the document index is updated with each updating command. If you perform numerous updates, you may get better performance by switching AUTOFLUSH off [1]. Another alternative to speed up multiple update operations is to use XQuery for updates. Due to the pending update list semantics, however, It will require more main memory.
Christian
[1] http://docs.basex.org/wiki/Options#AUTOFLUSH ___________________________
Dear all,
From what I read in the documentation, My problem seems to be related to the update of the resource index.
Is this index updated after each add/replace/delete command, Or at the end of the commands' list ?
Last, could you please tell me if replace is equivalent to delete+add ?
Best, Fabrice
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de