Hi Luke,
It would be nice to have some sort of pre-commit hook for validating modifications to the database so that we are not restricted to only allowing modifications through XQuery. It looks as though this is the point of https://github.com/BaseXdb/basex/issues/1082, but it looks as though that is on hold, after some significant discussion.
True, a pre-commit hook would be a good fit for applications that use the standard APIs of BaseX. I thought about mentioning this Github issue; even better that you’ve found it by yourself. The discussion has stalled a little, primarily because we have too many other things on our agenda. And I think we’d need to focus (too many ideas had been brought up there that cannot be brought together).
Presumably I could achieve schema validation by having the entire data set inside one document, but that would lose the benefits of collections, and having the data arranged similar to a file system, so ... I was hoping that I could define a Schematron rule something like this (untested, because I'm struggling to get Schematron working at the moment - content is not allowed in prolog):
The standard Schematron implementation that can be integrated as module is not part of BaseX itself; that’s why it cannot work on top of our database storage. Instead, single documents need to exported to a main-memory representation and sent to this validation library, and the library has its own XPath engine. I think there is no database-driven implementation of Schematron available out there, but I may be wrong?
The same applies to XML Schema: The implementations we provide support for work on main-memory document instances. In order to change this, we would probably need to write our own implementation of XML Schema.
Thus, our experience is that calls to XML Schema and Schematron are too slow if we need to check and process millions of nodes (which might be what you eventually need if your data has been stored in Oracle before). This is why we use our own framework for all time critical operations, such as integrity checks that need to be applied on-the-fly. In practice, this works pretty well: In one project I’m currently working on, around 8 millions of entries are stored, with thousands of daily updates, and numerous consistency checks in the background. It’s all done in XQuery.
Hope that helps (at least when it comes to understanding the status quo), Christian