Following up: The XQuery update approach seems to be working reliably.
Cheers,
E.
_____________________________________________
Eliot Kimber
Sr. Staff Content Engineer
O: 512 554 9368
servicenow
LinkedIn | X | YouTube | Instagram
From:
Eliot Kimber via BaseX-Talk <basex-talk@mailman.uni-konstanz.de>
Date: Thursday, April 24, 2025 at 9:41 PM
To: Christian Grün <christian.gruen@gmail.com>
Cc: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Subject: [basex-talk] Re: Recovering database in bad state
I’ve updated my code to use XQuery update to update the attributes. When I run it as a job, the job returns immediately and the next job starts. I’m assuming the update must be on the update queue (this is
in the context of a larger job sequence that constructs the link data).
I can tell the update is running because the database in question will show 0 files until the update is finished, at which point it will show the expected 2 files.
For my small-scale test, I have a doc with 79,632 attributes to be updated using this code:
declare updating function linkrk:updateWhereUsedIndexToProduction(
$database as xs:string
) {
let $whereUsedMap as element()? := db:get($database)/doc-where-used-index
let $msg := message(``[[DEBUG] linkrk:updateWhereUsedIndexToProduction(): Updating `{$database}` to production using XQuery update]``)
return
for $att in $whereUsedMap//noderef/(@database|@baseuri)[starts-with(., '_temp_')]
let $newValue := substring-after(string($att), '_temp_')
return replace value of node $att with $newValue
};
I see the debug message in my log and then the message for the start of the next job in the sequence.
The update seems to take about 1.5 minutes based on the log message time stamps, which is roughly 1.1ms/update, which seems about as fast as it could go without parallelizing the updates (this is on an M3
macBook Pro).
I tried using prof:time() to profile the update but prof:time() does not allow updating expressions.
Is there a technique for profiling this type of update better than the link timestamp analysis I’m doing?
In my full-scale content, I have about 1.5M attributes to be updated, so at 1.1ms/update, that comes to about 24 minutes, which isn’t too bad. Testing now.
Cheers,
E.
_____________________________________________
Eliot Kimber
Sr. Staff Content Engineer
O: 512 554 9368
servicenow
LinkedIn | X | YouTube | Instagram
From:
Christian Grün <christian.gruen@gmail.com>
Date: Thursday, April 24, 2025 at 8:52 AM
To: Eliot Kimber <eliot.kimber@servicenow.com>
Cc: Andy Bunce <bunce.andy@gmail.com>, basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] Re: Recovering database in bad state
Good to know.
So the issue is with my attempt to transform a 300MB document, not an issue with BaseX itself.
You may be able to save lots of time if you manage to rewrite the XSLT script to XQuery (Update). Here’s a script that creates an element with 1 million nodes, which are immediately
deleted again. It takes less than 1 second:
<a>{ (1 to 1000000) ! <b/> }</a> update {
delete node ./b
}
Best,
Christian