[basex-talk] Jobs, locking and doing large updates

18 Oct 2017


      Hi list!
I was experimenting with the jobs module for the last weeks to speed up 
updates and to make them fit into less than 6 GB of memory. It does not 
work the way I expected.
* Updating jobs don't seem to run in parallel even if they don't work 
with and lock the same database. Or do they start much later than I 
expected?
* It seems to me there is an upper limit of jobs that can be queued 
(about 100?). I automatically started all of the update jobs but some 
updates did not run.
The upside is: I can run my updates with just about 1 GB of memory which 
is much better for me.
I work with dictionary like XML documents. They look like
<root>
<entry>Contents with further tags</entry>
<entry>Contents with further tags</entry>
<entry>Contents with further tags</entry>
... a few thousand more ...
<entry>Contents with further tags</entry>
<entry>Contents with further tags</entry>
</root>
I add or change larger parts of them and I also need to keep track of 
changes. So in a separate database there are old versions of entries 
with time stamps (@dt) like
<hist>
<entry dt="">Contents with further tags</entry>
<entrydt="">Contents with further tags</entry>
<entrydt="">Contents with further tags</entry>
... a few thousand more ...
<entrydt="">Contents with further tags</entry>
<entrydt="">Contents with further tags</entry>
</hist>
I tried to do everything at once and when I need to update most of my 
entries (about 30000) I exhaust my memory.
So I use the jobs module to do this 100 at a time so the updates list 
does not grow beyond any reasonable size. Having those 100 as one 
transaction is good enough for my needs.
Then I thought maybe I should use jobs to seprate the two tasks. One 
async job saves a memory (update) copy of the old entry, the other 
writes the new one. After some transformations to my XQuery BaseX told 
me they don't lock the same database or global.
I expected the jobs running on different databases to work in parallel. 
They don't. I don't quite understand why.
Also I have a much larger dataset split into a number of databases where 
it would be quite useful to execute updates in parallel.
Am I missing something? Is this perhaps the wrong way to tackle this 
scaling problem?
Best regards
Omar Siam

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

[basex-talk] Jobs, locking and doing large updates