Best practice approach to data security between orgs while maintaining ability to combine data from multiple orgs

List overview All Threads
Download

newer

older

Re: [basex-talk] Updates and...

Charles F. Munat

13 Mar 2011 13 Mar '11

5:15 p.m.

Hello,

I'm building a Web-based application that tracks students in various schools. Much of the data is tree-structured, and, after numerous attempts, I've given up on using an RDBMS. It's a hammer and I need a wrench.

The schema for each school's data will be the same. Sitting above the schools is the "project" for which the application is being built.

The project needs to be able to query across multiple or all schools in order to run various analyses. So I want all the data in a single database, if possible.

But each school, I think, should be it's own "segment" of the database. That way, when a project manager logs in, he or she can look across all the segments, but when a school coordinator logs in, he or she accesses only the data for that school. I hope this makes sense.

I am not really very familiar with how the BaseX DB is organized. I don't want individual XML files (do I?). I just want a database filled with data. So how do "collections" work? How would I segment the database so that schools are separate whiles still providing easy cross-school access for project personnel?

I could just do something like this:

Where each school is simply an element under the root "project" element (with project data stashed away in a "project-data" element, or equivalent). But is there a better way?

What is the best practice here? I am running the DB unembedded.

Thanks!

Chas. Munat Somewhere in South America

Show replies by date

Andreas Weiler

14 Mar 14 Mar

3:33 a.m.

New subject: Best practice approach to data security between orgs while maintaining ability to combine data from multiple orgs

Hi Chas,

you can build a database for each school. XQuery has the ability to query over all these databases.

Another approach would be to create a collection and create for each school a document. Like: create db data add as schoolname <school name="school1/>

Your mentioned approach would work, too.

I guess it's a matter of how big the single documents (data) for each school would be.

-- Andreas

Am 13.03.2011 um 22:15 schrieb Charles F. Munat:

...

Hello,

I'm building a Web-based application that tracks students in various schools. Much of the data is tree-structured, and, after numerous attempts, I've given up on using an RDBMS. It's a hammer and I need a wrench.

The schema for each school's data will be the same. Sitting above the schools is the "project" for which the application is being built.

The project needs to be able to query across multiple or all schools in order to run various analyses. So I want all the data in a single database, if possible.

But each school, I think, should be it's own "segment" of the database. That way, when a project manager logs in, he or she can look across all the segments, but when a school coordinator logs in, he or she accesses only the data for that school. I hope this makes sense.

I am not really very familiar with how the BaseX DB is organized. I don't want individual XML files (do I?). I just want a database filled with data. So how do "collections" work? How would I segment the database so that schools are separate whiles still providing easy cross-school access for project personnel?

I could just do something like this:

<project> <project-data/> <school id="1"/> <school id="2"/> </project>

Where each school is simply an element under the root "project" element (with project data stashed away in a "project-data" element, or equivalent). But is there a better way?

What is the best practice here? I am running the DB unembedded.

Thanks!

Chas. Munat Somewhere in South America _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Jan Vlčinský (CAD)

5:45 a.m.

Hi Chas. I just want to say, that in my solution I have data in several databases and I query across all of them happily.

Jan

2011/3/13 Charles F. Munat charles.munat@gmail.com

...

Hello,

I'm building a Web-based application that tracks students in various schools. Much of the data is tree-structured, and, after numerous attempts, I've given up on using an RDBMS. It's a hammer and I need a wrench.

The schema for each school's data will be the same. Sitting above the schools is the "project" for which the application is being built.

The project needs to be able to query across multiple or all schools in order to run various analyses. So I want all the data in a single database, if possible.

But each school, I think, should be it's own "segment" of the database. That way, when a project manager logs in, he or she can look across all the segments, but when a school coordinator logs in, he or she accesses only the data for that school. I hope this makes sense.

I am not really very familiar with how the BaseX DB is organized. I don't want individual XML files (do I?). I just want a database filled with data. So how do "collections" work? How would I segment the database so that schools are separate whiles still providing easy cross-school access for project personnel?

I could just do something like this:

<project> <project-data/> <school id="1"/> <school id="2"/> </project>

Where each school is simply an element under the root "project" element (with project data stashed away in a "project-data" element, or equivalent). But is there a better way?

What is the best practice here? I am running the DB unembedded.

Thanks!

Chas. Munat Somewhere in South America _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

-- *Ing. Jan Vlčinský* CAD programy Slunečnicová 338/3, 734 01 Karviná Ráj, Czech Republic tel: +420-597 602 024; mob: +420-608 979 040 skype: janvlcinsky; GoogleTalk: jan.vlcinsky@gmail.com http://cz.linkedin.com/in/vlcinsky

NewIntellectual

11:35 a.m.

On Sun, Mar 13, 2011 at 5:15 PM, Charles F. Munat charles.munat@gmail.comwrote:

...

I'm building a Web-based application that tracks students in various schools. Much of the data is tree-structured, and, after numerous attempts, I've given up on using an RDBMS. It's a hammer and I need a wrench.

While I'm a happy BaseX user and huge fan, I suggest not relying on the current state of the system to act as a dynamic database. Practically speaking that has to be accomplished with XQuery Update and based on testing so far I do not have the sense that it is currently reliable enough. Your database system has to Just Work, and you want it to be rock solid and completely stable (and the worst thing in that context is a bug which ends up corrupting the database, perhaps silently, not simply crashing.) I trust BaseX for data retrieval, but I need to see it mature to trust it for continuous updates, both for stability and speed (the way the system currently handles element access by indices makes me wary of performance with very large databases, which would not pose a problem for an RDBMS.)

The relational model is completely generalized and mature, so it is unlikely that it cannot handle your application. It's a matter of correct table design, with appropriate methods to read/write your data to various joined tables. I suggest getting assistance from an expert data modeler and looking into PostgreSQL or the pure java H2 system.

Also, note that there is no concept of transactions with XQuery Update, unless I am missing something. There is no analog to commit/rollback, nor ability to enforce database consistency rules (e.g. foreign key rules, which in the XML world roughly maps into schema rules, and BaseX does not check for schema validity.)

A hybrid system that partitions "database record" stuff into RDBMS tables and "structured text" stuff using BaseX is certainly possible, with entire XML documents being periodically added. It's still an issue of choosing the right tool for the right domain.

Charles F. Munat

4:14 p.m.

So BaseX is essentially useless as much more than a toy? Do you have actual evidence to support this unreliability, or is it just a hunch? When does it fail, where, and why? Have you tested it with large data sets to see if element access is a problem? Can you provide numbers?

For the sake of kicking the arguments up a notch, let's assume that I've been working with databases since the late 1980s, that I'm what you call an "expert data modeler," that, in fact, my RDBMS of preference has been PostgreSQL for the better part of a decade, that I've used nested sets and adjacency lists and recursive queries and various other methods to shoehorn trees and graphs into tabular form, and that I have extensive familiarity with other databases, as well, including object, xml, key-value, and graph databases. We could even assume that I have a degree in Informatics and have worked with advanced mathematical and set concepts up through axiomatic set theory.

With these assumptions, can anyone tell me what's wrong with BaseX -- specifically -- and why I might want to hold off on abandoning my already-built RDBMS-backed application in favor of one working strictly with XML?

I am not saying that your concerns are unfounded, NewIntellectual, but only that you have not provided any real support for them. I can go either way... does anyone have strong evidence for one side or the other?

All responses gratefully welcome.

Chas.

On 03/14/2011 12:35 PM, NewIntellectual wrote:

...

On Sun, Mar 13, 2011 at 5:15 PM, Charles F. Munat <charles.munat@gmail.com mailto:charles.munat@gmail.com> wrote:
I'm building a Web-based application that tracks students in various
schools. Much of the data is tree-structured, and, after numerous
attempts, I've given up on using an RDBMS. It's a hammer and I need
a wrench.
While I'm a happy BaseX user and huge fan, I suggest not relying on the current state of the system to act as a dynamic database. Practically speaking that has to be accomplished with XQuery Update and based on testing so far I do not have the sense that it is currently reliable enough. Your database system has to Just Work, and you want it to be rock solid and completely stable (and the worst thing in that context is a bug which ends up corrupting the database, perhaps silently, not simply crashing.) I trust BaseX for data retrieval, but I need to see it mature to trust it for continuous updates, both for stability and speed (the way the system currently handles element access by indices makes me wary of performance with very large databases, which would not pose a problem for an RDBMS.)

The relational model is completely generalized and mature, so it is unlikely that it cannot handle your application. It's a matter of correct table design, with appropriate methods to read/write your data to various joined tables. I suggest getting assistance from an expert data modeler and looking into PostgreSQL or the pure java H2 system.

Also, note that there is no concept of transactions with XQuery Update, unless I am missing something. There is no analog to commit/rollback, nor ability to enforce database consistency rules (e.g. foreign key rules, which in the XML world roughly maps into schema rules, and BaseX does not check for schema validity.)

A hybrid system that partitions "database record" stuff into RDBMS tables and "structured text" stuff using BaseX is certainly possible, with entire XML documents being periodically added. It's still an issue of choosing the right tool for the right domain.

NewIntellectual

4:29 p.m.

On Mon, Mar 14, 2011 at 4:14 PM, Charles F. Munat charles.munat@gmail.comwrote:

...

So BaseX is essentially useless as much more than a toy?

Hardly. I intend to use it as the search engine for a fairly large document database, where so far it performs superlatively. But it's also an essentially read-only scenario, other than periodic chunky updates or perhaps total rebuilds.

...

Do you have actual evidence to support this unreliability, or is it just a hunch? When does it fail, where, and why? Have you tested it with large data sets to see if element access is a problem? Can you provide numbers?

Based on some issues that I had on some very large XQuery Updates against a database representing about 200,000 pages of text equivalent. So it was definitely a stress test - but it ended up corrupting the entire database and making it unusable. Still not known what caused it but that is enough to make me wary. That said, I have successfully used XQuery Update a number of times with BaseX without evident problems. But personally that was enough to make me avoid using it to e.g. store customer data.

...

For the sake of kicking the arguments up a notch, let's assume that I've been working with databases since the late 1980s, that I'm what you call an "expert data modeler," that, in fact, my RDBMS of preference has been PostgreSQL for the better part of a decade, that I've used nested sets and adjacency lists and recursive queries and various other methods to shoehorn trees and graphs into tabular form, and that I have extensive familiarity with other databases, as well, including object, xml, key-value, and graph databases. We could even assume that I have a degree in Informatics and have worked with advanced mathematical and set concepts up through axiomatic set theory.

Ok, that's cool. If that's your self-description then you clearly get relational theory. No slight was intended. The fact remains that relational theory can model just about anything - with enough work. And of course there is usually more than one way to do something, some of them better than others.

...

With these assumptions, can anyone tell me what's wrong with BaseX -- specifically -- and why I might want to hold off on abandoning my already-built RDBMS-backed application in favor of one working strictly with XML?

Reliability and performance are the two top priorities in my view. I for one would love to see a stress test done with BaseX with a non-trivially sized database with continuous XQuery Updates done for days at a time, simulating a real world environment, with the ability to check the final database state against a known accurate result. That would add to confidence - or help shake out bugs. Both are values. There is overwhelming evidence that the BaseX developers take the project very seriously and I am not saying that I think my concerns will be permanent - just trying to convey my concerns to date based on my own experience.

NewIntellectual

6:04 p.m.

In fairness it is worth noting that XQuery Update facility 1.0 is extremely new. See e.g. http://www.w3.org/TR/xquery-update-10/, dated 25 January 2011. I don't know of any other open source system which even attempts to completely support as many XQuery standards, such as XQuery Full Text and and the update facility.

Christian Grün

15 Mar 15 Mar

8:43 a.m.

Hi Charles, thanks Phil,

two years ago, we still have warned everyone to *avoid* BaseX for critical write operations. As (due to our history) most of our users have come across BaseX in the search for a efficient read-only solution, we have spent most of our time in optimizing read-only use case.

Since that time, we have put quite some effort into making update operations more robust, fixing remaining low-level bugs, writing stress tests, etc. We now use BaseX and XQuery Update successfully for our own write scenarios without any further complications, and we are convinced that BaseX is now at least as safe as other native XML database solutions.

While we cannot promise the same stability as is offered by 10, 20 year old RDBMS solutions, we will further do our best to debug (ideally reproducible) cases in which write problems persist.

Comments are welcome, Christian ___________________________

On Mon, Mar 14, 2011 at 11:04 PM, NewIntellectual newintellectual@gmail.com wrote:

...

In fairness it is worth noting that XQuery Update facility 1.0 is extremely new. See e.g. http://www.w3.org/TR/xquery-update-10/, dated 25 January 2011. I don't know of any other open source system which even attempts to completely support as many XQuery standards, such as XQuery Full Text and and the update facility.

5239

Age (days ago)

5241

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

7 comments

5 participants

tags (0)

participants (5)

Andreas Weiler
Charles F. Munat
Christian Grün
Jan Vlčinský (CAD)
NewIntellectual