Thanks. I'm still trying to get this to work. Is it possible to put updating expressions in a library module function (with the name of the database hard coded) and then call from the function within jobs:eval() in a main module? When I do this, the jobs don't seem to run in parallel. But if I put the updating expressions in the main module, the jobs do seem to run in parallel. Is this a limitation?
I have millions of updates (inserts) that I'm trying to run on 10 large databases (5GB each). In my current process, it takes about 48 hours to update a single DB. Are there other options you'd recommend in order to speed things up?
All best, Tim
-- Tim A. Thompson Discovery Metadata Librarian Yale University Library
On Wed, Feb 10, 2021 at 3:27 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Tim,
Updates can be run in parallel if the name of the database is directly specified in the query [1]:
jobs:eval('delete node db:open("db1")//abc'), jobs:eval('delete node db:open("db2")//def')
In a future version of BaseX, we might split up our compilation phase into multiple ones. After this, we could statically detect that a passed on variable will be the name of a database.
Until then, you could try to build a query string that included hard-coded database names.
Hope this helps, Christian
[1] https://docs.basex.org/wiki/Transaction_Management#XQuery
On Wed, Feb 10, 2021 at 1:56 AM Tim Thompson timathom@gmail.com wrote:
Thank you, Christian, for the detailed explanation!
One more question, if I may. Is it possible to run updating jobs on
different databases in parallel? Or can database update operations only be run sequentially, one db at a time? I have a query that calls a function to perform a series of operations:
for $i in (0 to 9) return ( jobs:eval(' declare variable $iter external; local:add-uris("marc.exp.20210115."||$iter) ', map {"iter": $i}) )
The function:
opens a database iterates through its records performs lookups against an index inserts any matches into the database calls file:append-text-lines() to write the results of the lookups
Based on some simple tests, it doesn't seem possible to run the jobs in
parallel, but I thought I would ask--to see whether there was something I was missing.
Thanks again, Tim