Enhanced Locking in BaseX: Status, "How Do You Use It?" Survey and "Wish List" - BaseX-Talk - mailman.uni-konstanz.de

5 May 2013


      Dear BaseX-Community,
# Status On the Improvements
Over the last months we implemented huge enhancements on locking in BaseX. Writing transactions do not block the whole system any more; parallel reading and writing transactions are possible. For the 7.7 release (currently still in beta) we did further improvements like downgrading of locks before applying the pending update list to the database (which is IO-intensive) and more fine granular locks for commands, for example `COPY` only needs a read lock on the source database (which is great for backing up the system without need for stopping it).
Nonetheless, estimating which locks are needed is a hard job in such powerful languages like XQuery. Databases can be opened at arbitrary positions in the code, and they can even be dependent on the content of other databases or external resources - it is not always possible to determine them at compile time, which leads to unnecessarily locked databases.
# How Do You Use BaseX, What Are Your Locking Requirements?
There are definitely more ways to enhance locking in BaseX. We developed the basic infrastructure for further enhancement (like MVCC) and already achieved a noticeable improvement by applying two phase locking, but for knowing how to go on, we'd like to hear your thoughts and needs.
Possible ways to go would be "the classic database route" by using MVCC or heading straight to the opposite direction by evaluating optimistic locking or similar "non-locking" protocols. All of these have advantages and drawbacks,
I'd be glad to receive your input on a few questions to determine some common use cases:
- What is your application doing?
- Which BaseX APIs do you use?
- How do typical queries look like? If you cannot share them, a description following this guide would suffice:
    - How often will that query be run? Especially I'm asking you to differentiate between writing and updating queries, and how many databases you will access.
    - Complexity (using functions, modules, nested loops, lines of code)
    - Where do function calls opening databases and/or collections occur?
    - Do you use administrative functions (creating/... databases) inside XQuery?
    - Do you use / require `xquery:eval(…)`?
    - Do you access external resources like files or using HTTP requests? Do you need concurrency control for these?
- Some enhancements could require transactions to be restarted. This could lead to HTTP-Requests / … to be executed again (we cannot roll back resources outside BaseX). Is this a problem for your application? Would it help if you were notified over the API that the query was restarted? (There will always be an alternative without restarted transactions!)
- Any other comments you have with respect to locking and transactions
* * *
You can post information on that on the mailing list or send me a private [mail] (address below) if you do not share these information publicly. I will develop some use cases for my thesis out of your replies, and we will make decisions based on them how to go on with further improvements.
If you cannot / do not want to answer all of these, we're happy on every input you give us.
If your reply contains information not to be shared, please give me a hint and it will not be shared beyond the core BaseX team - but we will nonetheless include it in our future plans.
Kind regards from Lake Constance, Germany,
Jens Erat
-- 
Jens Erat

 [phone]: tel:+49-151-56961126
  [mail]: mailto:email@jenserat.de
[jabber]: xmpp:jabber@jenserat.de
   [web]: http://www.jenserat.de

     PGP: 350E D9B6 9ADC 2DED F5F2  8549 CBC2 613C D745 722B