Hi,
Christian and I had a brief discussion regarding create db, session semantics and the use of connection pools on github [1]. He asked for real user feedback.
I'll try to briefly describe what motivated us to use connection pools; I believe our experience may be indicative of that of others developing applications in which BaseX provides the database tier.
Our application is an AJAX-based front-end that allows users to manage information stored in a BaseX database. When a user clicks on some part of the user interface, the resulting AJAX request may trigger one or more operations on a BaseX database. Typically between 1 and as many as 10. (Now part of these operations could perhaps be avoided through caching, but caching comes of course with its own problems since we have no easy way to keep multiple caches for the same BaseX instance coherent). These operations include queries, updates, creating/renaming databases, and the like.
In this common situation, how would one manage access to BaseX?
Option A. is to create a new TCP connection to BaseX for every single query,update, or command. That would involve 1-10 TCP connection setups, plus whatever (and however small) session setup costs BaseX has, per user event. As a ballpark number, if the BaseX server runs on a separate machine in our cluster, as is good practice, this can involve many milliseconds of overhead, which adds to the overall latency of our application. (I note that one source of overhead is the need to serialize our in-memory DOM objects for each query, and to parse the result. But of course there's framework delay, network delay etc. Many of which are infrastructure-dependent, thus difficult to avoid/optimize.)
Option B. is to create a new connection for each AJAX request, and close it when the AJAX request has finished. This would reduce the TCP connection/session setup overhead in that only one would be required per event round-trip. However, our application uses the ZK framework, which is event-based. As a consequence, the framework doesn't provide us with hook points where we'd learn when the handling of an AJAX request starts and when it ends. Instead, execution is event-driven and may involve exceptional control flow. In short, we couldn't reliably close those connections.
Option C. is the use of connection pools. When an AJAX request arrives, we execute our application logic. Whenever we reach a BaseX query/update/command in the execution, we check for an available connection. Up to a maximum number, we create new connections on demand. If the maximum number is reached, we wait (we could also reject the request, there are different load management options here). We use that connection, independent of which queries/updates/commands were previously executed within the session that this connection represents. That's why it's important for us that BaseX provide means to avoid inadvertent session state.
For these reasons, we believe that the use of connection pools is an appropriate means to handle connection management to the BaseX server.
What experience do others have that use BaseX as a database tier in their web applications?
- Godmar
basex-talk@mailman.uni-konstanz.de