I think I’m probably doing something similar on a current project with but with some careful query writes and use of jobs I’ve been able to keep everything running within acceptable time.
However while optimising my code it did get me thinking about possible improvements. Clearly it’s difficult (or even impossible) to determine all the databases that will be used ahead of running the query. But I wonder how many times the calling function already knows?
Then I thought of the (# db:enforceindex #) that was introduced for cases where the query writer knows that the databases will have indexes. I wondered if something similar might be possible for databases.
A pragma or a function wrapper that would allow the name of a database (or databases) to be supplied and that would restrict access only to that database for the rest of the query. Returning an error if the query tries to address another.
I’m sure this wouldn’t be simple - but might be easier and more reliable than trying to find more optimisations to the locking algorithm.
Just thinking aloud…
Kindest regards, James
From: Christian Grün christian.gruen@gmail.com Subject: Re: [basex-talk] Global locks Date: 11 February 2019 at 12:27:37 GMT To: Andy Bunce bunce.andy@gmail.com Cc: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de
Hi Andy,
The current behavior is correct indeed – but it might not be what one expects. Currently, we are…
a) collecting all static database references in the query and b) assigning either read or write locks to these databases, depending if the overall query is updating or not.
The reason is that it’s often tricky to determine statically (i.e., while parsing the query and before compiling and optimizing it) which databases will be accessed for read or write operations without analyzing the query in more detail. An arbitrary example:
let $db := db:open('db1') return insert node <new/> into $db/*
We would need to follow the variable reference in order to find out if db1 will be updated. In simple queries such as yours, however, this might be possible; I’ll have some more thoughts on that.
Cheers Christian
Hi James, hi Andy,
When adding database locks to BaseX, I remember we had some thoughts if we shouldn’t add explicit lock features to BaseX, exactly for those cases in which the optimizer is not powerful enough to detect the required locks. One of the reasons why we discarded this idea was that we feared that people would run into deadlocks or inconsistent states.
Indeed I like your idea of raising an error once a database is accessed that was not manually locked; this would prevent us from running into deadlocks or creating inconsistent states.
One solution could be to introduce a new MANUALLOCK option, and to extend the existing basex:read-lock and basex:write-lock pragmas to databases. If manual locking is enabled, an error will be raised if a database is accessed that has not been specified in a lock pragma. If it’s disabled, databases specified by query locks will be simply be added to the list of databases to be locked (or ignored if they have already been detected automatically, or discarded if a global lock needs to be set). Manual locks could either be assigned globally or within a query:
(# db:manuallock #) { (# basex:write-lock BEP-Staging #) { (# basex:read-lock BEP #) { let $d:=db:open('BEP') return db:create('BEP-staging',$d,$d!base-uri(.)) } } }
I wouldn’t call this syntax particularly appealing (it’s surely something that should only be used for special cases in a code base), but if database locking is enhanced in a future version, such pragmas could simply be removed from a query.
Any thoughts on that? Christian
On Mon, Feb 11, 2019 at 3:22 PM James Ball basex-talk@jamesball.co.uk wrote:
I think I’m probably doing something similar on a current project with but with some careful query writes and use of jobs I’ve been able to keep everything running within acceptable time.
However while optimising my code it did get me thinking about possible improvements. Clearly it’s difficult (or even impossible) to determine all the databases that will be used ahead of running the query. But I wonder how many times the calling function already knows?
Then I thought of the (# db:enforceindex #) that was introduced for cases where the query writer knows that the databases will have indexes. I wondered if something similar might be possible for databases.
A pragma or a function wrapper that would allow the name of a database (or databases) to be supplied and that would restrict access only to that database for the rest of the query. Returning an error if the query tries to address another.
I’m sure this wouldn’t be simple - but might be easier and more reliable than trying to find more optimisations to the locking algorithm.
Just thinking aloud…
Kindest regards, James
From: Christian Grün christian.gruen@gmail.com Subject: Re: [basex-talk] Global locks Date: 11 February 2019 at 12:27:37 GMT To: Andy Bunce bunce.andy@gmail.com Cc: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de
Hi Andy,
The current behavior is correct indeed – but it might not be what one expects. Currently, we are…
a) collecting all static database references in the query and b) assigning either read or write locks to these databases, depending if the overall query is updating or not.
The reason is that it’s often tricky to determine statically (i.e., while parsing the query and before compiling and optimizing it) which databases will be accessed for read or write operations without analyzing the query in more detail. An arbitrary example:
let $db := db:open('db1') return insert node <new/> into $db/*
We would need to follow the variable reference in order to find out if db1 will be updated. In simple queries such as yours, however, this might be possible; I’ll have some more thoughts on that.
Cheers Christian
On 13 Feb 2019, at 16:38, Christian Grün christian.gruen@gmail.com wrote:
Manual locks could either be assigned globally or within a query:
(# db:manuallock #) { (# basex:write-lock BEP-Staging #) { (# basex:read-lock BEP #) { let $d:=db:open('BEP') return db:create('BEP-staging',$d,$d!base-uri(.)) } } }
I wouldn’t call this syntax particularly appealing
Yes - that’s certainly a monster. :)
Would this work where the database names are only known at run time?
So if I POST data to http://www.example.com/add/myDatabaseName
Do we need something like:
xquery:databaseRestrict($databaseName, addStuffFunction($data) )
which will raise an error if any database other than $databaseName is referenced by addStuffFunction.
Just noting my thoughts - need to consider it a little more.
Regards, James
Would this work where the database names are only known at run time?
Right now, our locking is based on static code analysis; runtime information won’t be included. In the longer term, we would like to extend our compiler to perform multiple steps:
1. Static parsing 2. Static optimizations (pre-evaluate 1+2 to 3, etc.) 3. Dynamic optimizations (reoptimize the query with query parameters and other context information) 4. Physical optimizations (open databases, utilize available index structures)
Queries that have been parsed and statically compiled (Step 1+2) could be stored as compiled queries, or kept in main-memory and duplicated before dynamic optimizations take place. This could further reduce access times in RESTXQ applications – and it would allow us to improve locking, as we could do the lock check after Step 3.
I look forward to the longer term solution :) But in the meantime these pragmas, although not pretty, would allow solutions to specific multitasking performance issues to be explored where currently they can be difficult/impossible. I think there is a certain elegance in advisory nature of pragmas.
If manual locking is enabled, an error will be raised if a database is
accessed that has not been specified in a lock pragma.
Is it required to specify all databases accessed within the manualock scope or could it work just specifying specific databases as readonly? And the execution would error if these were violated.
/Andy
/Andy
On Wed, 13 Feb 2019 at 16:56, Christian Grün christian.gruen@gmail.com wrote:
Would this work where the database names are only known at run time?
Right now, our locking is based on static code analysis; runtime information won’t be included. In the longer term, we would like to extend our compiler to perform multiple steps:
- Static parsing
- Static optimizations (pre-evaluate 1+2 to 3, etc.)
- Dynamic optimizations (reoptimize the query with query parameters
and other context information) 4. Physical optimizations (open databases, utilize available index structures)
Queries that have been parsed and statically compiled (Step 1+2) could be stored as compiled queries, or kept in main-memory and duplicated before dynamic optimizations take place. This could further reduce access times in RESTXQ applications – and it would allow us to improve locking, as we could do the lock check after Step 3.
Hi Andy,
Is it required to specify all databases accessed within the manualock scope or could it work just specifying specific databases as readonly? And the execution would error if these were violated.
Sounds good. I have opened a new issue [1]; we welcome further syntax proposals.
Cheers Christian
basex-talk@mailman.uni-konstanz.de