I want to solve the following problem: For $doc in $list-of-docs detect differences in doc against the basex-db and add the changed records to the basex-db. After differences of each doc are added to the basex-dB create a new index for basex-dB which is required for the next $doc
How can I solve the problem that the added records are not visible for the index creation? Michael
Hi Michael,
IMHO I don't think it is the right way to handle data changes in a document oriented database.
An efficient way may be to add new versions as they come.
There is always a way to sort the related documents - sometimes with an attribute in the data,
or with a part of the filename.
If not, you might have to build an index database containing the tuples <object_id, version, pre-id> (because pre-id node is constant in a append-only db).
Then I would write a simple function(object_id) returning the top element in the versions' list ordered by descending version (using hof:top-k-by for example).
You can also split your data in two :
a big readonly database containing the data before one point in time (index already setup).
a light append-only database containing the data after that point in time (where index update is fast or even UPDINDEX option is set).
On schedule, you would construct a new readonly database aggregating the back and front data.
Note that with two (or even more !) databases, you would have to add the database name in the index tuple <object_id, version, db-name, pre-id>
I had success with that update strategy when working with the EPO DOCDB collection (https://www.epo.org/searching-for-patents/data/bulk-data-sets/docdb.html#tab...).
Thanks to Christian for giving me the right pointers when I needed to !
Hoping it helps,
Best regards,
Fabrice ETANCHAUD
De : @pyschny.de michael@pyschny.de À : basex-talk@mailman.uni-konstanz.de Sujet : [basex-talk] dB:update() Date : 11/09/2018 16:09:01 CEST
I want to solve the following problem: For $doc in $list-of-docs detect differences in doc against the basex-db and add the changed records to the basex-db. After differences of each doc are added to the basex-dB create a new index for basex-dB which is required for the next $doc
How can I solve the problem that the added records are not visible for the index creation? Michael
basex-talk@mailman.uni-konstanz.de