-----Message d'origine----- De : Fabrice Etanchaud Envoyé : mardi 23 septembre 2014 18:00 À : 'Christian Grün' Objet : RE: [basex-talk] Adding documents slows over time
Dear Christian,
In our old tests, we found that in a collection with several millions documents, opening that collection, or replacing a document was very very long.
In latest snapshot, could you tell us how to use the index on the document names ? Given 10 000 000 documents named $i.xml containing <xml>{$i}</xml> We found that text index is 470x faster than documents' one :
Compiling: - pre-evaluating (7000001 to 7001000) Query: for $i in 7000001 to 7001000 return db:open('docs', xs:string($i) || '.xml') Optimized Query: for $i_0 in (7000001 to 7001000) return db:open("docs", fn:concat($i_0 cast as xs:string, ".xml")) Result: - Hit(s): 1000 Items - Updated: 0 Items - Printed: 19500 Bytes - Read Locking: local [docs] - Write Locking: none Timing: - Parsing: 0.91 ms - Compiling: 0.24 ms - Evaluating: 68514.39 ms - Printing: 1.61 ms - Total Time: 68517.16 ms
Compiling: - pre-evaluating (7000001 to 7001000) Query: for $i in 7000001 to 7001000 return db:text('docs', xs:string($i))/root() Optimized Query: for $i_0 in (7000001 to 7001000) return db:text("docs", $i_0 cast as xs:string)/fn:root() Result: - Hit(s): 1000 Items - Updated: 0 Items - Printed: 19500 Bytes - Read Locking: local [docs] - Write Locking: none Timing: - Parsing: 2.62 ms - Compiling: 0.23 ms - Evaluating: 143.72 ms - Printing: 1.59 ms - Total Time: 148.16 ms
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : mardi 23 septembre 2014 16:34 À : Fabrice Etanchaud Cc : Marco Lettere; basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] Adding documents slows over time
Hi Fabrice,
If you update your collection per document, you can use the replace command instead of xquery update and get free of pending update list limitations.
I would be interested what limitations you have observed so far?
Christian, from what I read in the last exchanges, the document index is now a persistent data structure ?
Exactly. After it has been requested for the first time, it will additionally stored on disk and updated incrementally. I would be interested to have your feedback on the latest snapshot.
Christian