I’m rethinking the implementation of my Mirabel system and trying to understand how best to manage concurrency for the web app (satisfying multiple read requests from web clients) while also managing updates to databases in a way that avoids blocking of reads or delaying of updates.
I’ve done a read through of the current documentation at https://docs.basex.org/ and also reviewed what I could find online and such. In the documentation I find a number of references to the “client/server” architecture but I’m not finding any particularly deep discussion of it, either in the docs or by searching on i.e., “basex client server”.
When I started my Mirabel project I understood that the way to get concurrency was to use multiple BaseX HTTP instances, which can make concurrent read requests on a single set of databases. But now I can’t find the source that led me to that conclusion—I know the product docs have been reworked significantly since then, so maybe something got lost in the update?
Based on my latest reading, it seems clear that in order for BaseX to best manage concurrency of reads and writes that the requests need to be within a single server instance running in a single JVM. If I’m understanding it correctly, for FAIRLOCK to enable interleaving of read and update operations it has to be operating in a single JVM, which means that having different JVMs reading from a database being updated by yet another JVM is going to be problematic because the only lock mechanism in that case is the global lock,
Given that, it’s not clear how you implement a multi-user web application that needs to not block while waiting for longer-running queries to complete but still have a single server instance to satisfy the queries.
So I feel like I’m missing something.
In my current solution I run multiple basexhttp servers where the first server is the web app server. It uses a REST handler that inspects the load on each server and sends requests to the lowest-load server. This works but in my reading I’m starting to think that this level of complexity is not required (or maybe, more accurately, surely others would have needed the same mechanism?).
In my web app I’ve implemented a general asynchronous query mechanism where I can serve web pages with elements that then trigger async requests back to the server to fetch elements (for example, I cache HTML renderings of DITA tables that are then fetched asynchronously to populate report tables that include the rendered tables). I don’t see a way to avoid this level of optimization and it’s not too complicated. This seems to work well and there are more opportunities for caching things.
I’m also continuing to think about Tamara’s suggestion to use polling or web sockets to perform long-running queries asynchronously, store the results in a results database, then trigger fetching in the client (in response to an earlier question of mine about how to avoid keeping HTTP request sessions open for a long time).
So my question:
What is the general architecture for a web application that needs to support large numbers of concurrent users making large numbers of requests and manage updates to databases?
Thanks,
Eliot
_____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
I found Eliot's post a very interesting read. I'm still struggling to fully comprehend the client/server architecture of BaseX and found the documentation regarding this subject a bit sparse. I'm also curious about the response(s) to the question he poses at the end of his post.
Hi Eliot,
Free time is a rare resource nowadays; just some quick feedback:
I’ve done a read through of the current documentation at
https://docs.basex.org/ and also reviewed what I could find online and such. In the documentation I find a number of references to the “client/server” architecture but I’m not finding any particularly deep discussion of it, either in the docs or by searching on i.e., “basex client server”.
The best entry point may be Getting Started → Database Server [1].
When I started my Mirabel project I understood that the way to get
concurrency was to use multiple BaseX HTTP instances, which can make concurrent read requests on a single set of databases.
That’s dangerous (and has always been problematic). If you use have concurrent operations, you should have one central HTTP instance. Otherwise, you might run into concurrency issues and locked databases, as multiple JVMs cannot share their information among each other [2].
It may be difficult to give profound answers on the remaining questions in a few lines. Maybe others can share their experiences.
Best,
Christian
[1] https://docs.basex.org/12/Getting_Started [2] https://docs.basex.org/main/Startup#concurrent_operations
I fully understand the issue of time.
The Database Server page (https://docs.basex.org/12/Database_Server) doesn’t really provide the details I’m looking for.
In particular, it’s not clear to me how a BaseX server would be used with an HTTP server in order to manage parallel query execution and ensure a responsive web site in the face of 100s of concurrent web users making 1000s of query requests. My current architecture handles this in terms of responsiveness and horizontal scaling, but as you say, it runs into issues with contention on locks for databases being updated.
I know other people have successfully implemented public-facing web sites with BaseX so I’m curious how they’ve done it—is the life cycle of their content such that updates are not much of an issue or are they doing something different? Am I missing some way to make a single BaseX server take advantage of all available cores? I understood a Java JVM as using a single core, but maybe my understanding is wrong?
It may be that BaseX as I’m using it is not the right way to do what I’m doing. For example, it might make more sense to implement the web site using a typical node.js and React system that then uses BaseX exclusively through a REST API. That still presents the problem of how to scale handling of queries but avoids any issues with the web site itself being responsive. My team is learning how to use node.js, next.js, and React for other projects so it’s something we could explore.
I could also explore using other database solutions for some or all of what I want to do. For example, maybe it makes more sense to put my where-used table into a key-value store (even Solr could work for this pretty easily) or a SQL database and reserve BaseX for doing the XML-aware data processing needed to construct the table and doing other XML- and text-aware queries. But that would still run into performance issues, where I’m looking for 10ms response times for doing lookups in the where-used table.
Or maybe I just need to do more caching of query results where the results are stable for a given content set.
I started this project without any particular plan and got a long way just building it as I went but now that I’m tasked with fixing a number of design and behavior issues with my initial approach, I need to make sure I really know what I’m doing and make the most appropriate implementation choices.
Thanks,
Eliot _____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
From: Christian Grün christian.gruen@gmail.com Date: Thursday, December 12, 2024 at 5:11 AM To: Eliot Kimber eliot.kimber@servicenow.com Cc: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation? [External Email]
________________________________ Hi Eliot,
Free time is a rare resource nowadays; just some quick feedback:
I’ve done a read through of the current documentation at https://docs.basex.org/https://docs.basex.org/ and also reviewed what I could find online and such. In the documentation I find a number of references to the “client/server” architecture but I’m not finding any particularly deep discussion of it, either in the docs or by searching on i.e., “basex client server”.
The best entry point may be Getting Started → Database Server [1].
When I started my Mirabel project I understood that the way to get concurrency was to use multiple BaseX HTTP instances, which can make concurrent read requests on a single set of databases.
That’s dangerous (and has always been problematic). If you use have concurrent operations, you should have one central HTTP instance. Otherwise, you might run into concurrency issues and locked databases, as multiple JVMs cannot share their information among each other [2].
It may be difficult to give profound answers on the remaining questions in a few lines. Maybe others can share their experiences.
Best, Christian
[1] https://docs.basex.org/12/Getting_Startedhttps://docs.basex.org/12/Getting_Started [2] https://docs.basex.org/main/Startup#concurrent_operationshttps://docs.basex.org/main/Startup#concurrent_operations
Hello Eliot,
I have only one BaseX instance, but to avoid the locking issue during large updates/optimizations, I have multiple copies of the databases. Updates are performed on "working" databases, and then I use db:copy to duplicate them to "production" databases for users on the front end to query. I haven't seen or heard of any problems with concurrent users on the public side when they're just reading from the production databases.
-Tamara
On Thu, Dec 12, 2024 at 6:53 AM Eliot Kimber eliot.kimber@servicenow.com wrote:
I fully understand the issue of time.
The Database Server page (https://docs.basex.org/12/Database_Server) doesn’t really provide the details I’m looking for.
In particular, it’s not clear to me how a BaseX server would be used with an HTTP server in order to manage parallel query execution and ensure a responsive web site in the face of 100s of concurrent web users making 1000s of query requests. My current architecture handles this in terms of responsiveness and horizontal scaling, but as you say, it runs into issues with contention on locks for databases being updated.
I know other people have successfully implemented public-facing web sites with BaseX so I’m curious how they’ve done it—is the life cycle of their content such that updates are not much of an issue or are they doing something different? Am I missing some way to make a single BaseX server take advantage of all available cores? I understood a Java JVM as using a single core, but maybe my understanding is wrong?
It may be that BaseX as I’m using it is not the right way to do what I’m doing. For example, it might make more sense to implement the web site using a typical node.js and React system that then uses BaseX exclusively through a REST API. That still presents the problem of how to scale handling of queries but avoids any issues with the web site itself being responsive. My team is learning how to use node.js, next.js, and React for other projects so it’s something we could explore.
I could also explore using other database solutions for some or all of what I want to do. For example, maybe it makes more sense to put my where-used table into a key-value store (even Solr could work for this pretty easily) or a SQL database and reserve BaseX for doing the XML-aware data processing needed to construct the table and doing other XML- and text-aware queries. But that would still run into performance issues, where I’m looking for 10ms response times for doing lookups in the where-used table.
Or maybe I just need to do more caching of query results where the results are stable for a given content set.
I started this project without any particular plan and got a long way just building it as I went but now that I’m tasked with fixing a number of design and behavior issues with my initial approach, I need to make sure I really know what I’m doing and make the most appropriate implementation choices.
Thanks,
Eliot
*Eliot Kimber*
Sr. Staff Content Engineer
O: 512 554 9368
*servicenow*
servicenow.com https://www.servicenow.com
LinkedIn https://www.linkedin.com/company/servicenow | X https://twitter.com/servicenow | YouTube https://www.youtube.com/user/servicenowinc | Instagram https://www.instagram.com/servicenow
*From: *Christian Grün christian.gruen@gmail.com *Date: *Thursday, December 12, 2024 at 5:11 AM *To: *Eliot Kimber eliot.kimber@servicenow.com *Cc: *basex-talk@mailman.uni-konstanz.de < basex-talk@mailman.uni-konstanz.de> *Subject: *Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation? *[External Email]*
Hi Eliot,
Free time is a rare resource nowadays; just some quick feedback:
I’ve done a read through of the current documentation at
https://docs.basex.org/ and also reviewed what I could find online and such. In the documentation I find a number of references to the “client/server” architecture but I’m not finding any particularly deep discussion of it, either in the docs or by searching on i.e., “basex client server”.
The best entry point may be Getting Started → Database Server [1].
When I started my Mirabel project I understood that the way to get
concurrency was to use multiple BaseX HTTP instances, which can make concurrent read requests on a single set of databases.
That’s dangerous (and has always been problematic). If you use have concurrent operations, you should have one central HTTP instance. Otherwise, you might run into concurrency issues and locked databases, as multiple JVMs cannot share their information among each other [2].
It may be difficult to give profound answers on the remaining questions in a few lines. Maybe others can share their experiences.
Best,
Christian
[1] https://docs.basex.org/12/Getting_Started
[2] https://docs.basex.org/main/Startup#concurrent_operations
Hello Eliot and Tamara,
I’ve observed what appears to be – though haven’t fully tested to isolate and confirm this – instances where a write operation such as db:create() blocks BaseX from serving other http requests -- which use db:list() and db:get() -- until the write operation is finished.
On reading the description of lock detection here https://docs.basex.org/main/BaseX_10#compilation I’m now wondering if it might help to apply a naming convention to database names so that it’s possible to distinguish by name which databases are currently used for read vs write – although renaming databases might add other complexities.
Thanks, Vincent
_____________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group vincent.lizzi@taylorandfrancis.commailto:vincent.lizzi@taylorandfrancis.com
Information Classification: General From: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de On Behalf Of Tamara Marnell Sent: Thursday, December 12, 2024 12:56 PM To: Eliot Kimber eliot.kimber@servicenow.com Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation?
Hello Eliot,
I have only one BaseX instance, but to avoid the locking issue during large updates/optimizations, I have multiple copies of the databases. Updates are performed on "working" databases, and then I use db:copy to duplicate them to "production" databases for users on the front end to query. I haven't seen or heard of any problems with concurrent users on the public side when they're just reading from the production databases.
-Tamara
On Thu, Dec 12, 2024 at 6:53 AM Eliot Kimber <eliot.kimber@servicenow.commailto:eliot.kimber@servicenow.com> wrote: I fully understand the issue of time.
The Database Server page (https://docs.basex.org/12/Database_Serverhttps://docs.basex.org/12/Database_Server) doesn’t really provide the details I’m looking for.
In particular, it’s not clear to me how a BaseX server would be used with an HTTP server in order to manage parallel query execution and ensure a responsive web site in the face of 100s of concurrent web users making 1000s of query requests. My current architecture handles this in terms of responsiveness and horizontal scaling, but as you say, it runs into issues with contention on locks for databases being updated.
I know other people have successfully implemented public-facing web sites with BaseX so I’m curious how they’ve done it—is the life cycle of their content such that updates are not much of an issue or are they doing something different? Am I missing some way to make a single BaseX server take advantage of all available cores? I understood a Java JVM as using a single core, but maybe my understanding is wrong?
It may be that BaseX as I’m using it is not the right way to do what I’m doing. For example, it might make more sense to implement the web site using a typical node.js and React system that then uses BaseX exclusively through a REST API. That still presents the problem of how to scale handling of queries but avoids any issues with the web site itself being responsive. My team is learning how to use node.js, next.js, and React for other projects so it’s something we could explore.
I could also explore using other database solutions for some or all of what I want to do. For example, maybe it makes more sense to put my where-used table into a key-value store (even Solr could work for this pretty easily) or a SQL database and reserve BaseX for doing the XML-aware data processing needed to construct the table and doing other XML- and text-aware queries. But that would still run into performance issues, where I’m looking for 10ms response times for doing lookups in the where-used table.
Or maybe I just need to do more caching of query results where the results are stable for a given content set.
I started this project without any particular plan and got a long way just building it as I went but now that I’m tasked with fixing a number of design and behavior issues with my initial approach, I need to make sure I really know what I’m doing and make the most appropriate implementation choices.
Thanks,
Eliot _____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
From: Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> Date: Thursday, December 12, 2024 at 5:11 AM To: Eliot Kimber <eliot.kimber@servicenow.commailto:eliot.kimber@servicenow.com> Cc: basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de> Subject: Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation? [External Email]
________________________________ Hi Eliot,
Free time is a rare resource nowadays; just some quick feedback:
I’ve done a read through of the current documentation at https://docs.basex.org/https://docs.basex.org/ and also reviewed what I could find online and such. In the documentation I find a number of references to the “client/server” architecture but I’m not finding any particularly deep discussion of it, either in the docs or by searching on i.e., “basex client server”.
The best entry point may be Getting Started → Database Server [1].
When I started my Mirabel project I understood that the way to get concurrency was to use multiple BaseX HTTP instances, which can make concurrent read requests on a single set of databases.
That’s dangerous (and has always been problematic). If you use have concurrent operations, you should have one central HTTP instance. Otherwise, you might run into concurrency issues and locked databases, as multiple JVMs cannot share their information among each other [2].
It may be difficult to give profound answers on the remaining questions in a few lines. Maybe others can share their experiences.
Best, Christian
[1] https://docs.basex.org/12/Getting_Startedhttps://docs.basex.org/12/Getting_Started [2] https://docs.basex.org/main/Startup#concurrent_operationshttps://docs.basex.org/main/Startup#concurrent_operations
--
Tamara Marnell Program Manager, Systems Orbis Cascade Alliance (orbiscascade.orghttps://www.orbiscascade.org/) Pronouns: she/her/hers
In my approach, all updates are done to databases that are only used by my data loading HTTP server.
Basically, I create a set of temporary databases, load the new data into those, then, like Tamara, swap out the production (read-only) databases with the newly-created temp databases.
I’ve currently implemented this by doing:
1. Rename the production database to something unique (i.e., “_dropme_databasename”) 2. Rename the temp database to the production name 3. Drop what was the production database.
This is in the context of my multi-server approach, where all the updates of a given set of related databases are done by a single HTTP server (and thus a single JVM).
Cheers,
E.
_____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
From: Lizzi, Vincent Vincent.Lizzi@taylorandfrancis.com Date: Thursday, December 12, 2024 at 12:59 PM To: Tamara Marnell tmarnell@orbiscascade.org, Eliot Kimber eliot.kimber@servicenow.com Cc: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Subject: RE: [basex-talk] Deeper discussion of BaseX client/server and web app implementation? [External Email]
________________________________ Hello Eliot and Tamara,
I’ve observed what appears to be – though haven’t fully tested to isolate and confirm this – instances where a write operation such as db:create() blocks BaseX from serving other http requests -- which use db:list() and db:get() -- until the write operation is finished.
On reading the description of lock detection here https://docs.basex.org/main/BaseX_10#compilationhttps://docs.basex.org/main/BaseX_10#compilation I’m now wondering if it might help to apply a naming convention to database names so that it’s possible to distinguish by name which databases are currently used for read vs write – although renaming databases might add other complexities.
Thanks, Vincent
_____________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group vincent.lizzi@taylorandfrancis.commailto:vincent.lizzi@taylorandfrancis.com
Information Classification: General From: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de On Behalf Of Tamara Marnell Sent: Thursday, December 12, 2024 12:56 PM To: Eliot Kimber eliot.kimber@servicenow.com Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation?
Hello Eliot,
I have only one BaseX instance, but to avoid the locking issue during large updates/optimizations, I have multiple copies of the databases. Updates are performed on "working" databases, and then I use db:copy to duplicate them to "production" databases for users on the front end to query. I haven't seen or heard of any problems with concurrent users on the public side when they're just reading from the production databases.
-Tamara
On Thu, Dec 12, 2024 at 6:53 AM Eliot Kimber <eliot.kimber@servicenow.commailto:eliot.kimber@servicenow.com> wrote: I fully understand the issue of time.
The Database Server page (https://docs.basex.org/12/Database_Serverhttps://docs.basex.org/12/Database_Server) doesn’t really provide the details I’m looking for.
In particular, it’s not clear to me how a BaseX server would be used with an HTTP server in order to manage parallel query execution and ensure a responsive web site in the face of 100s of concurrent web users making 1000s of query requests. My current architecture handles this in terms of responsiveness and horizontal scaling, but as you say, it runs into issues with contention on locks for databases being updated.
I know other people have successfully implemented public-facing web sites with BaseX so I’m curious how they’ve done it—is the life cycle of their content such that updates are not much of an issue or are they doing something different? Am I missing some way to make a single BaseX server take advantage of all available cores? I understood a Java JVM as using a single core, but maybe my understanding is wrong?
It may be that BaseX as I’m using it is not the right way to do what I’m doing. For example, it might make more sense to implement the web site using a typical node.js and React system that then uses BaseX exclusively through a REST API. That still presents the problem of how to scale handling of queries but avoids any issues with the web site itself being responsive. My team is learning how to use node.js, next.js, and React for other projects so it’s something we could explore.
I could also explore using other database solutions for some or all of what I want to do. For example, maybe it makes more sense to put my where-used table into a key-value store (even Solr could work for this pretty easily) or a SQL database and reserve BaseX for doing the XML-aware data processing needed to construct the table and doing other XML- and text-aware queries. But that would still run into performance issues, where I’m looking for 10ms response times for doing lookups in the where-used table.
Or maybe I just need to do more caching of query results where the results are stable for a given content set.
I started this project without any particular plan and got a long way just building it as I went but now that I’m tasked with fixing a number of design and behavior issues with my initial approach, I need to make sure I really know what I’m doing and make the most appropriate implementation choices.
Thanks,
Eliot _____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
From: Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> Date: Thursday, December 12, 2024 at 5:11 AM To: Eliot Kimber <eliot.kimber@servicenow.commailto:eliot.kimber@servicenow.com> Cc: basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de> Subject: Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation? [External Email]
________________________________ Hi Eliot,
Free time is a rare resource nowadays; just some quick feedback:
I’ve done a read through of the current documentation at https://docs.basex.org/https://docs.basex.org/ and also reviewed what I could find online and such. In the documentation I find a number of references to the “client/server” architecture but I’m not finding any particularly deep discussion of it, either in the docs or by searching on i.e., “basex client server”.
The best entry point may be Getting Started → Database Server [1].
When I started my Mirabel project I understood that the way to get concurrency was to use multiple BaseX HTTP instances, which can make concurrent read requests on a single set of databases.
That’s dangerous (and has always been problematic). If you use have concurrent operations, you should have one central HTTP instance. Otherwise, you might run into concurrency issues and locked databases, as multiple JVMs cannot share their information among each other [2].
It may be difficult to give profound answers on the remaining questions in a few lines. Maybe others can share their experiences.
Best, Christian
[1] https://docs.basex.org/12/Getting_Startedhttps://docs.basex.org/12/Getting_Started [2] https://docs.basex.org/main/Startup#concurrent_operationshttps://docs.basex.org/main/Startup#concurrent_operations
--
Tamara Marnell Program Manager, Systems Orbis Cascade Alliance (orbiscascade.orghttps://www.orbiscascade.org/) Pronouns: she/her/hers
I wanted to revisit this discussion in the new year.
In the context of a different internal initiative, I’ve been learning more about traditional web sites architecture and implementation (i.e., apache httpd plus statically generated sites using node.js).
This has gotten me to wondering whether the general architecture for a large-user-count, long-query-handling web site is to have one BaseX HTTP server to serve the web site and handle requests and a second server (basexserver) that does all the query work and accepts requests from the HTTP server?
That’s based on the assumption that a basexserver instance can handle a large number of concurrent requests as it must handle its own internal threading etc. This also presumes that a single server instance (one JVM) can use multiple cores on a multi-core server (my production server is an 8-CPU server). A little reading on jetty suggests that it should be able to handle the concurrent load I’m likely to have with no problem.
This would still require a mechanism to manage long-lived HTTP requests from the client, but there are various ways to handle that, including using web sockets to alert a client that a long request has completed and be more sophisticated with HTTP request details, as well as storing query results in a cache location from they are served back to the client.
It seems like my architectural mistake is having a BaseX HTTP server that both serves the interactive web site and makes queries and then trying to scale horizontally by having multiple HTTP servers to which requests are delegated by the primary server.
By having a single baseserver instance handling all queries, it can manage read and write locking appropriately.
Is this an appropriate architecture?
Thanks,
Eliot _____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
From: Eliot Kimber eliot.kimber@servicenow.com Date: Thursday, December 12, 2024 at 4:40 PM To: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation? In my approach, all updates are done to databases that are only used by my data loading HTTP server.
Basically, I create a set of temporary databases, load the new data into those, then, like Tamara, swap out the production (read-only) databases with the newly-created temp databases.
I’ve currently implemented this by doing:
1. Rename the production database to something unique (i.e., “_dropme_databasename”) 2. Rename the temp database to the production name 3. Drop what was the production database.
This is in the context of my multi-server approach, where all the updates of a given set of related databases are done by a single HTTP server (and thus a single JVM).
Cheers,
E.
_____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
From: Lizzi, Vincent Vincent.Lizzi@taylorandfrancis.com Date: Thursday, December 12, 2024 at 12:59 PM To: Tamara Marnell tmarnell@orbiscascade.org, Eliot Kimber eliot.kimber@servicenow.com Cc: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Subject: RE: [basex-talk] Deeper discussion of BaseX client/server and web app implementation? [External Email]
________________________________ Hello Eliot and Tamara,
I’ve observed what appears to be – though haven’t fully tested to isolate and confirm this – instances where a write operation such as db:create() blocks BaseX from serving other http requests -- which use db:list() and db:get() -- until the write operation is finished.
On reading the description of lock detection here https://docs.basex.org/main/BaseX_10#compilationhttps://docs.basex.org/main/BaseX_10#compilation I’m now wondering if it might help to apply a naming convention to database names so that it’s possible to distinguish by name which databases are currently used for read vs write – although renaming databases might add other complexities.
Thanks, Vincent
_____________________________________________ Vincent M. Lizzi Head of Information Standards | Taylor & Francis Group vincent.lizzi@taylorandfrancis.commailto:vincent.lizzi@taylorandfrancis.com
Information Classification: General From: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de On Behalf Of Tamara Marnell Sent: Thursday, December 12, 2024 12:56 PM To: Eliot Kimber eliot.kimber@servicenow.com Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation?
Hello Eliot,
I have only one BaseX instance, but to avoid the locking issue during large updates/optimizations, I have multiple copies of the databases. Updates are performed on "working" databases, and then I use db:copy to duplicate them to "production" databases for users on the front end to query. I haven't seen or heard of any problems with concurrent users on the public side when they're just reading from the production databases.
-Tamara
On Thu, Dec 12, 2024 at 6:53 AM Eliot Kimber <eliot.kimber@servicenow.commailto:eliot.kimber@servicenow.com> wrote: I fully understand the issue of time.
The Database Server page (https://docs.basex.org/12/Database_Serverhttps://docs.basex.org/12/Database_Server) doesn’t really provide the details I’m looking for.
In particular, it’s not clear to me how a BaseX server would be used with an HTTP server in order to manage parallel query execution and ensure a responsive web site in the face of 100s of concurrent web users making 1000s of query requests. My current architecture handles this in terms of responsiveness and horizontal scaling, but as you say, it runs into issues with contention on locks for databases being updated.
I know other people have successfully implemented public-facing web sites with BaseX so I’m curious how they’ve done it—is the life cycle of their content such that updates are not much of an issue or are they doing something different? Am I missing some way to make a single BaseX server take advantage of all available cores? I understood a Java JVM as using a single core, but maybe my understanding is wrong?
It may be that BaseX as I’m using it is not the right way to do what I’m doing. For example, it might make more sense to implement the web site using a typical node.js and React system that then uses BaseX exclusively through a REST API. That still presents the problem of how to scale handling of queries but avoids any issues with the web site itself being responsive. My team is learning how to use node.js, next.js, and React for other projects so it’s something we could explore.
I could also explore using other database solutions for some or all of what I want to do. For example, maybe it makes more sense to put my where-used table into a key-value store (even Solr could work for this pretty easily) or a SQL database and reserve BaseX for doing the XML-aware data processing needed to construct the table and doing other XML- and text-aware queries. But that would still run into performance issues, where I’m looking for 10ms response times for doing lookups in the where-used table.
Or maybe I just need to do more caching of query results where the results are stable for a given content set.
I started this project without any particular plan and got a long way just building it as I went but now that I���m tasked with fixing a number of design and behavior issues with my initial approach, I need to make sure I really know what I’m doing and make the most appropriate implementation choices.
Thanks,
Eliot _____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
From: Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> Date: Thursday, December 12, 2024 at 5:11 AM To: Eliot Kimber <eliot.kimber@servicenow.commailto:eliot.kimber@servicenow.com> Cc: basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de> Subject: Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation? [External Email]
________________________________ Hi Eliot,
Free time is a rare resource nowadays; just some quick feedback:
I’ve done a read through of the current documentation at https://docs.basex.org/https://docs.basex.org/ and also reviewed what I could find online and such. In the documentation I find a number of references to the “client/server” architecture but I’m not finding any particularly deep discussion of it, either in the docs or by searching on i.e., “basex client server”.
The best entry point may be Getting Started → Database Server [1].
When I started my Mirabel project I understood that the way to get concurrency was to use multiple BaseX HTTP instances, which can make concurrent read requests on a single set of databases.
That’s dangerous (and has always been problematic). If you use have concurrent operations, you should have one central HTTP instance. Otherwise, you might run into concurrency issues and locked databases, as multiple JVMs cannot share their information among each other [2].
It may be difficult to give profound answers on the remaining questions in a few lines. Maybe others can share their experiences.
Best, Christian
[1] https://docs.basex.org/12/Getting_Startedhttps://docs.basex.org/12/Getting_Started [2] https://docs.basex.org/main/Startup#concurrent_operationshttps://docs.basex.org/main/Startup#concurrent_operations
--
Tamara Marnell Program Manager, Systems Orbis Cascade Alliance (orbiscascade.orghttps://www.orbiscascade.org/) Pronouns: she/her/hers
Hi Eliot,
A short reply, as so often: It is definitely an option to work with multiple BaseX instances and use one of them for delegating incoming requests. Other BaseX instances can e.g. be addressed with the Client Module [1]. If you have millions of requests per day, it can get recommendable to either disable logging (in case you have a proxy layer anyway) or use the recently added log filter to reduce the number of entries [2].
Hope this helps, Christian
[1] https://docs.basex.org/main/Client_Functions [2] https://docs.basex.org/12/Options#logexclude
Eliot Kimber via BaseX-Talk basex-talk@mailman.uni-konstanz.de schrieb am Do., 23. Jan. 2025, 20:55:
I wanted to revisit this discussion in the new year.
In the context of a different internal initiative, I’ve been learning more about traditional web sites architecture and implementation (i.e., apache httpd plus statically generated sites using node.js).
This has gotten me to wondering whether the general architecture for a large-user-count, long-query-handling web site is to have one BaseX HTTP server to serve the web site and handle requests and a second server (basexserver) that does all the query work and accepts requests from the HTTP server?
That’s based on the assumption that a basexserver instance can handle a large number of concurrent requests as it must handle its own internal threading etc. This also presumes that a single server instance (one JVM) can use multiple cores on a multi-core server (my production server is an 8-CPU server). A little reading on jetty suggests that it should be able to handle the concurrent load I’m likely to have with no problem.
This would still require a mechanism to manage long-lived HTTP requests from the client, but there are various ways to handle that, including using web sockets to alert a client that a long request has completed and be more sophisticated with HTTP request details, as well as storing query results in a cache location from they are served back to the client.
It seems like my architectural mistake is having a BaseX HTTP server that both serves the interactive web site and makes queries and then trying to scale horizontally by having multiple HTTP servers to which requests are delegated by the primary server.
By having a single baseserver instance handling all queries, it can manage read and write locking appropriately.
Is this an appropriate architecture?
Thanks,
Eliot
*Eliot Kimber*
Sr. Staff Content Engineer
O: 512 554 9368
*servicenow*
servicenow.com https://www.servicenow.com
LinkedIn https://www.linkedin.com/company/servicenow | X https://twitter.com/servicenow | YouTube https://www.youtube.com/user/servicenowinc | Instagram https://www.instagram.com/servicenow
*From: *Eliot Kimber eliot.kimber@servicenow.com *Date: *Thursday, December 12, 2024 at 4:40 PM *To: *basex-talk@mailman.uni-konstanz.de < basex-talk@mailman.uni-konstanz.de> *Subject: *Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation?
In my approach, all updates are done to databases that are only used by my data loading HTTP server.
Basically, I create a set of temporary databases, load the new data into those, then, like Tamara, swap out the production (read-only) databases with the newly-created temp databases.
I’ve currently implemented this by doing:
- Rename the production database to something unique (i.e.,
“_dropme_databasename”)
Rename the temp database to the production name
Drop what was the production database.
This is in the context of my multi-server approach, where all the updates of a given set of related databases are done by a single HTTP server (and thus a single JVM).
Cheers,
E.
*Eliot Kimber*
Sr. Staff Content Engineer
O: 512 554 9368
*servicenow*
servicenow.com https://www.servicenow.com
LinkedIn https://www.linkedin.com/company/servicenow | X https://twitter.com/servicenow | YouTube https://www.youtube.com/user/servicenowinc | Instagram https://www.instagram.com/servicenow
*From: *Lizzi, Vincent Vincent.Lizzi@taylorandfrancis.com *Date: *Thursday, December 12, 2024 at 12:59 PM *To: *Tamara Marnell tmarnell@orbiscascade.org, Eliot Kimber < eliot.kimber@servicenow.com> *Cc: *basex-talk@mailman.uni-konstanz.de < basex-talk@mailman.uni-konstanz.de> *Subject: *RE: [basex-talk] Deeper discussion of BaseX client/server and web app implementation? *[External Email]*
Hello Eliot and Tamara,
I’ve observed what appears to be – though haven’t fully tested to isolate and confirm this – instances where a write operation such as db:create() blocks BaseX from serving other http requests -- which use db:list() and db:get() -- until the write operation is finished.
On reading the description of lock detection here https://docs.basex.org/main/BaseX_10#compilation I’m now wondering if it might help to apply a naming convention to database names so that it’s possible to distinguish by name which databases are currently used for read vs write – although renaming databases might add other complexities.
Thanks,
Vincent
*Vincent M. Lizzi*
Head of Information Standards | Taylor & Francis Group
vincent.lizzi@taylorandfrancis.com
Information Classification: General
*From:* BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de *On Behalf Of *Tamara Marnell *Sent:* Thursday, December 12, 2024 12:56 PM *To:* Eliot Kimber eliot.kimber@servicenow.com *Cc:* basex-talk@mailman.uni-konstanz.de *Subject:* Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation?
Hello Eliot,
I have only one BaseX instance, but to avoid the locking issue during large updates/optimizations, I have multiple copies of the databases. Updates are performed on "working" databases, and then I use db:copy to duplicate them to "production" databases for users on the front end to query. I haven't seen or heard of any problems with concurrent users on the public side when they're just reading from the production databases.
-Tamara
On Thu, Dec 12, 2024 at 6:53 AM Eliot Kimber eliot.kimber@servicenow.com wrote:
I fully understand the issue of time.
The Database Server page (https://docs.basex.org/12/Database_Server) doesn’t really provide the details I’m looking for.
In particular, it’s not clear to me how a BaseX server would be used with an HTTP server in order to manage parallel query execution and ensure a responsive web site in the face of 100s of concurrent web users making 1000s of query requests. My current architecture handles this in terms of responsiveness and horizontal scaling, but as you say, it runs into issues with contention on locks for databases being updated.
I know other people have successfully implemented public-facing web sites with BaseX so I’m curious how they’ve done it—is the life cycle of their content such that updates are not much of an issue or are they doing something different? Am I missing some way to make a single BaseX server take advantage of all available cores? I understood a Java JVM as using a single core, but maybe my understanding is wrong?
It may be that BaseX as I’m using it is not the right way to do what I’m doing. For example, it might make more sense to implement the web site using a typical node.js and React system that then uses BaseX exclusively through a REST API. That still presents the problem of how to scale handling of queries but avoids any issues with the web site itself being responsive. My team is learning how to use node.js, next.js, and React for other projects so it’s something we could explore.
I could also explore using other database solutions for some or all of what I want to do. For example, maybe it makes more sense to put my where-used table into a key-value store (even Solr could work for this pretty easily) or a SQL database and reserve BaseX for doing the XML-aware data processing needed to construct the table and doing other XML- and text-aware queries. But that would still run into performance issues, where I’m looking for 10ms response times for doing lookups in the where-used table.
Or maybe I just need to do more caching of query results where the results are stable for a given content set.
I started this project without any particular plan and got a long way just building it as I went but now that I’m tasked with fixing a number of design and behavior issues with my initial approach, I need to make sure I really know what I’m doing and make the most appropriate implementation choices.
Thanks,
Eliot
*Eliot Kimber*
Sr. Staff Content Engineer
O: 512 554 9368
*servicenow*
servicenow.com https://www.servicenow.com
LinkedIn https://www.linkedin.com/company/servicenow | X https://twitter.com/servicenow | YouTube https://www.youtube.com/user/servicenowinc | Instagram https://www.instagram.com/servicenow
*From: *Christian Grün christian.gruen@gmail.com *Date: *Thursday, December 12, 2024 at 5:11 AM *To: *Eliot Kimber eliot.kimber@servicenow.com *Cc: *basex-talk@mailman.uni-konstanz.de < basex-talk@mailman.uni-konstanz.de> *Subject: *Re: [basex-talk] Deeper discussion of BaseX client/server and web app implementation? *[External Email]*
Hi Eliot,
Free time is a rare resource nowadays; just some quick feedback:
I’ve done a read through of the current documentation at
https://docs.basex.org/ and also reviewed what I could find online and such. In the documentation I find a number of references to the “client/server” architecture but I’m not finding any particularly deep discussion of it, either in the docs or by searching on i.e., “basex client server”.
The best entry point may be Getting Started → Database Server [1].
When I started my Mirabel project I understood that the way to get
concurrency was to use multiple BaseX HTTP instances, which can make concurrent read requests on a single set of databases.
That’s dangerous (and has always been problematic). If you use have concurrent operations, you should have one central HTTP instance. Otherwise, you might run into concurrency issues and locked databases, as multiple JVMs cannot share their information among each other [2].
It may be difficult to give profound answers on the remaining questions in a few lines. Maybe others can share their experiences.
Best,
Christian
[1] https://docs.basex.org/12/Getting_Started
[2] https://docs.basex.org/main/Startup#concurrent_operations
--
Tamara Marnell
Program Manager, Systems
Orbis Cascade Alliance (orbiscascade.org https://www.orbiscascade.org/)
Pronouns: she/her/hers
basex-talk@mailman.uni-konstanz.de