In my Mirabel system, I create a link database that records all the links made in a set of documents. This becomes a “where used” index over the content.
We have on the order if 200K links for one content set, so at 0.1 second per link it takes about 7 hours to build this index.
I’m currently doing this in one process that builds the whole index and then stores it in a database. This is failing in hard-to-diagnose ways, for example, because a database has a write lock on it when I go to rename it from it’s temp name to it’s production name (to replace the current production version).
The data is such that I could parallelize the processing but I’m not sure how I would do that in BaseX so that I can safely write to a single database from multiple threads.
The fork-join() docs clearly say “non-updating” functions, so that doesn’t seem to be an option.
I have multiple BaseX HTTP servers running so I could farm processing across them, but I think I would then run into write lock issues.
I could create separate databases for each thread of operation and then combine those at the end—that seems like it might be the best option.
Have I missed anything?
Thanks,
Eliot _____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
basex-talk@mailman.uni-konstanz.de