motivation for using connection pools - BaseX-Talk - mailman.uni-konstanz.de

11 Aug 2012


      Hi,
Christian and I had a brief discussion regarding create db, session
semantics and the use of connection pools on github [1]. He asked for real
user feedback.
I'll try to briefly describe what motivated us to use connection pools; I
believe our experience may be indicative of that of others developing
applications in which BaseX provides the database tier.
Our application is an AJAX-based front-end that allows users to manage
information stored in a BaseX database. When a user clicks on some part of
the user interface, the resulting AJAX request may trigger one or more
operations on a BaseX database. Typically between 1 and as many as 10. (Now
part of these operations could perhaps be avoided through caching, but
caching comes of course with its own problems since we have no easy way to
keep multiple caches for the same BaseX instance coherent). These
operations include queries, updates, creating/renaming databases, and the
like.
In this common situation, how would one manage access to BaseX?
Option A. is to create a new TCP connection to BaseX for every single
query,update, or command. That would involve 1-10 TCP connection setups,
plus whatever (and however small) session setup costs BaseX has, per user
event. As a ballpark number, if the BaseX server runs on a separate machine
in our cluster, as is good practice, this can involve many milliseconds of
overhead, which adds to the overall latency of our application. (I note
that one source of overhead is the need to serialize our in-memory DOM
objects for each query, and to parse the result. But of course there's
framework delay, network delay etc. Many of which are
infrastructure-dependent, thus difficult to avoid/optimize.)
Option B. is to create a new connection for each AJAX request, and close it
when the AJAX request has finished. This would reduce the TCP
connection/session setup overhead in that only one would be required per
event round-trip. However, our application uses the ZK framework, which is
event-based. As a consequence, the framework doesn't provide us with hook
points where we'd learn when the handling of an AJAX request starts and
when it ends. Instead, execution is event-driven and may involve
exceptional control flow. In short, we couldn't reliably close those
connections.
Option C. is the use of connection pools. When an AJAX request arrives, we
execute our application logic. Whenever we reach a BaseX
query/update/command in the execution, we check for an available
connection. Up to a maximum number, we create new connections on demand. If
the maximum number is reached, we wait (we could also reject the request,
there are different load management options here). We use that connection,
independent of which queries/updates/commands were previously executed
within the session that this connection represents. That's why it's
important for us that BaseX provide means to avoid inadvertent session
state.
For these reasons, we believe that the use of connection pools is an
appropriate means to handle connection management to the BaseX server.
What experience do others have that use BaseX as a database tier in their
web applications?
- Godmar
[1] https://github.com/BaseXdb/basex/issues/551