I’m rethinking the implementation of my Mirabel system and trying to understand how best to manage concurrency for the web app (satisfying multiple read requests from web clients) while also managing updates to databases in a way that avoids blocking of reads or delaying of updates.

 

I’ve done a read through of the current documentation at https://docs.basex.org/ and also reviewed what I could find online and such. In the documentation I find a number of references to the “client/server” architecture but I’m not finding any particularly deep discussion of it, either in the docs or by searching on i.e., “basex client server”.

 

When I started my Mirabel project I understood that the way to get concurrency was to use multiple BaseX HTTP instances, which can make concurrent read requests on a single set of databases. But now I can’t find the source that led me to that conclusion—I know the product docs have been reworked significantly since then, so maybe something got lost in the update?

 

Based on my latest reading, it seems clear that in order for BaseX to best manage concurrency of reads and writes that the requests need to be within a single server instance running in a single JVM. If I’m understanding it correctly, for FAIRLOCK to enable interleaving of read and update operations it has to be operating in a single JVM, which means that having different JVMs reading from a database being updated by yet another JVM is going to be problematic because the only lock mechanism in that case is the global lock,

 

Given that, it’s not clear how you implement a multi-user web application that needs to not block while waiting for longer-running queries to complete but still have a single server instance to satisfy the queries.

 

So I feel like I’m missing something.

 

In my current solution I run multiple basexhttp servers where the first server is the web app server. It uses a REST handler that inspects the load on each server and sends requests to the lowest-load server. This works but in my reading I’m starting to think that this level of complexity is not required (or maybe, more accurately, surely others would have needed the same mechanism?).

 

In my web app I’ve implemented a general asynchronous query mechanism where I can serve web pages with elements that then trigger async requests back to the server to fetch elements (for example, I cache HTML renderings of DITA tables that are then fetched asynchronously to populate report tables that include the rendered tables). I don’t see a way to avoid this level of optimization and it’s not too complicated. This seems to work well and there are more opportunities for caching things.

 

I’m also continuing to think about Tamara’s suggestion to use polling or web sockets to perform long-running queries asynchronously, store the results in a results database, then trigger fetching in the client (in response to an earlier question of mine about how to avoid keeping HTTP request sessions open for a long time).

 

So my question:

What is the general architecture for a web application that needs to support large numbers of concurrent users making large numbers of requests and manage updates to databases?

 

Thanks,

 

Eliot

 

_____________________________________________

Eliot Kimber

Sr. Staff Content Engineer

O: 512 554 9368

 

servicenow

 

servicenow.com

LinkedIn | X | YouTube | Instagram