Hi list!
I was experimenting with the jobs module for the last weeks to speed up updates and to make them fit into less than 6 GB of memory. It does not work the way I expected.
* Updating jobs don't seem to run in parallel even if they don't work with and lock the same database. Or do they start much later than I expected?
* It seems to me there is an upper limit of jobs that can be queued (about 100?). I automatically started all of the update jobs but some updates did not run.
The upside is: I can run my updates with just about 1 GB of memory which is much better for me.
I work with dictionary like XML documents. They look like
<root>
<entry>Contents with further tags</entry>
<entry>Contents with further tags</entry>
<entry>Contents with further tags</entry>
... a few thousand more ...
<entry>Contents with further tags</entry>
<entry>Contents with further tags</entry>
</root>
I add or change larger parts of them and I also need to keep track of changes. So in a separate database there are old versions of entries with time stamps (@dt) like
<hist>
<entry dt="">Contents with further tags</entry>
<entrydt="">Contents with further tags</entry>
<entrydt="">Contents with further tags</entry>
... a few thousand more ...
<entrydt="">Contents with further tags</entry>
<entrydt="">Contents with further tags</entry>
</hist>
I tried to do everything at once and when I need to update most of my entries (about 30000) I exhaust my memory. So I use the jobs module to do this 100 at a time so the updates list does not grow beyond any reasonable size. Having those 100 as one transaction is good enough for my needs. Then I thought maybe I should use jobs to seprate the two tasks. One async job saves a memory (update) copy of the old entry, the other writes the new one. After some transformations to my XQuery BaseX told me they don't lock the same database or global. I expected the jobs running on different databases to work in parallel. They don't. I don't quite understand why. Also I have a much larger dataset split into a number of databases where it would be quite useful to execute updates in parallel. Am I missing something? Is this perhaps the wrong way to tackle this scaling problem?
Best regards Omar Siam