Hi Richard,

I don't quite get you concerns regarding tight coupling and I don't quite see where you think this applies for BaseX. Of course (thanks to RESTXQ) you can provide a REST interface for your application and access the database directly, i.e. somehow mixing the application and database layer. But it is also perfectly possible to separate these two layers, e.g. by having two BaseX instances, one acting as middleware and implementing all your business logic and another as the database layer. Ando f course you could also use some other language as your middleware layer and use this language to talk to your favourite database (which is BaseX, I presume ;) BaseX provides many language bindings, so this should not be an issue.

In the end, the "problem" here is that XQuery is way more powerful than many other query languages being a full-fledge functional language. Think of all that databases which tackle very specific problems (like a key/value store) and will therefore be way more restricted in their possibilities, but also will be much faster than BaseX for many applications. In the end, this will always be a trade-off: If you restrict your use case and the possible operations you do this to gain performance. This is all perfectly valid and I think everyone has to decide for their specific use case which is he most suitable tool.

The same problem in theory could also apply to traditional relational databases. For many use cases you might be able to implement all your business logic also with SQL and use your fancy stored procedures as business logic. But few people do this, because it is cumbersome with SQL (and not everything is possible...) and it is also not the best architectural decision. So the problem I see here with XQuery/BaseX is that it is easier to mix the middleware and the database layer. But I would argue that it is indeed a feature, not a bug, that this is easy, because, well, in the end don't we all like easy stuff?

And I think this is separately from the enterprise features you mentioned and yes, those would be all nice to have. But I don't think those have much to do with a tight coupling. That BaseX is missing those is simply based on the fact that noone has done it yet and that the task is really not that easy. So given that BaseX is run by a very small (but great!) team, this is simply a very big issue to tackle. And it is also hard to be compared to the big enterprise databases: I have seen teams working on DB2 just tackling the task to put more stuff into L2 and L3 caches to improve the performance having way more team members than the BaseX core team. Given that BaseX itself is mostly implemented solely by Christian I think he is doing an extraordinary job.

Cheers
Dirk

-----Ursprüngliche Nachricht-----
Von: Richard Stanley [mailto:richardlstanley@gmail.com]
Gesendet: Mittwoch, 2. August 2017 04:19
An: Andreas Jung <lists@zopyx.com>
Cc: Kirsten, Dirk <Dirk.Kirsten@senacor.com>; BaseX <basex-talk@mailman.uni-konstanz.de>
Betreff: Re: [basex-talk] State of replication and clustering

+1

There’s a need for enterprise features like horizontal scaling, replicaas, sharding, etc. BaseX has the web app and the database tightly coupled. There is no separation between statefulness (database) and statelessness (application), so apps can’t be treated as being empheral. This is unfortunate for high availability and leads to caution against enterprise deployments.

I’d love to see the separation of concerns. This is the case for Django, Flask, Node, Rails, Laravel, and just about any other modern web framework.

Best,
RIchard

On Aug 1, 2017, at 14:04, Andreas Jung <lists@zopyx.com> wrote:

Hi Dirk

in our case we have about 1 GB of product catalog for 30 languages spread across 30 XML files…so not much data.

One or more instances of a webservices will perform only queries - only reads - on the data. Standard XPath queries and a bunch of full text queries (in particular queries related to „find as you type“). On my machine (8 cores, 32 GB RAM) I could reach up to 50 XPath queries per second. We have no numbers about the expected workload (new system, new application).
So we must be prepared to scale. So one single BaseX node might not be enough at some point. Currently I am thinking about bundling one BaseX instance with all the data + one webservice instance into one container. So every container is self-contained and we should be able to scale up by starting up as much containers as needed. Not the perfect solution but one that should work smoothly. I also looked into exist-db and replication but their replication mechanism has too many moving parts and scares me a bit. Do I have a free wish? A configuration-less replication mechanism (multi-master) as we have it in Elasticsearch…but only a dream :-)

Andreas


On 1 Aug 2017, at 18:51, Kirsten, Dirk wrote:

Hi Andreas,

I am not quite sure to what presentation at XML Prague 2013 you are referring to, but I would guess it was mine given that I was working at this topic at that time and I think I would remember hearing someone else giving a talk about it...

Unfortunately, this was a researched project (my master thesis; should be somewhere on basex.org, but really is a thesis and hardly of any use if you want to "just use it") and never really was continued after 2014 and was far, far from being able to go upstream. So I guess for now it is simply not here and it is quite some project so it would require serious effort.

However, if you just have to read you might be able to partition your data in some way it is appropriate for your application and put the different data on different servers/file systems. But this depends heavily on your use case. Also, it might be interessant what you think the limit will be that you need to scale out for reads. Do you simply have so much data you can't store it on one file system. Or do you have so many parallel users you want to gain some performance?

Cheers
Dirk

Senacor Technologies Aktiengesellschaft - Sitz: Eschborn - Amtsgericht Frankfurt am Main - Reg.-Nr.: HRB 105546
Vorstand: Matthias Tomann, Marcus Purzer - Aufsichtsratsvorsitzender: Daniel Grözinger


-----Ursprüngliche Nachricht-----
Von: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] Im Auftrag von Andreas Jung
Gesendet: Dienstag, 1. August 2017 15:12
An: BaseX
Betreff: [basex-talk] State of replication and clustering

Hi there,

what is the state of replication and clustering of BaseX?

I found an XML Prague 2013 presentation but almost no documentation on these topics on the website.

In our case we need to scale out horizontally with a growing number of reads (no writes involved).

Andreas