On Mon, Mar 14, 2011 at 4:14 PM, Charles F. Munat charles.munat@gmail.comwrote:
So BaseX is essentially useless as much more than a toy?
Hardly. I intend to use it as the search engine for a fairly large document database, where so far it performs superlatively. But it's also an essentially read-only scenario, other than periodic chunky updates or perhaps total rebuilds.
Do you have actual evidence to support this unreliability, or is it just a hunch? When does it fail, where, and why? Have you tested it with large data sets to see if element access is a problem? Can you provide numbers?
Based on some issues that I had on some very large XQuery Updates against a database representing about 200,000 pages of text equivalent. So it was definitely a stress test - but it ended up corrupting the entire database and making it unusable. Still not known what caused it but that is enough to make me wary. That said, I have successfully used XQuery Update a number of times with BaseX without evident problems. But personally that was enough to make me avoid using it to e.g. store customer data.
For the sake of kicking the arguments up a notch, let's assume that I've been working with databases since the late 1980s, that I'm what you call an "expert data modeler," that, in fact, my RDBMS of preference has been PostgreSQL for the better part of a decade, that I've used nested sets and adjacency lists and recursive queries and various other methods to shoehorn trees and graphs into tabular form, and that I have extensive familiarity with other databases, as well, including object, xml, key-value, and graph databases. We could even assume that I have a degree in Informatics and have worked with advanced mathematical and set concepts up through axiomatic set theory.
Ok, that's cool. If that's your self-description then you clearly get relational theory. No slight was intended. The fact remains that relational theory can model just about anything - with enough work. And of course there is usually more than one way to do something, some of them better than others.
With these assumptions, can anyone tell me what's wrong with BaseX -- specifically -- and why I might want to hold off on abandoning my already-built RDBMS-backed application in favor of one working strictly with XML?
Reliability and performance are the two top priorities in my view. I for one would love to see a stress test done with BaseX with a non-trivially sized database with continuous XQuery Updates done for days at a time, simulating a real world environment, with the ability to check the final database state against a known accurate result. That would add to confidence - or help shake out bugs. Both are values. There is overwhelming evidence that the BaseX developers take the project very seriously and I am not saying that I think my concerns will be permanent - just trying to convey my concerns to date based on my own experience.