BaseX XML interface

List overview All Threads
Download

newer

older

BASEX and PHP: for $node in...

Addendum to last post

NewIntellectual

23 Jul 2010 23 Jul '10

8:31 p.m.

Godmar Back asks: I'm still curious what XML interface I'm missing.

While I agree that published interfaces should work properly, I think a simple point is being completely missed. Both XML and XQuery are, by their nature, human readable text, regardless of their complexity. That means that queries can be sent simply as text strings, and results of any size can also be returned in that way and then parsed as XML fragments in any number of myriad ways, including parsing to an in-memory DOM. I successfully use this approach to query an XML database representing close to 200,000 pages of text and to retrieve pretty large XML fragments (actually documents with a single root element) quickly. This may be a simple approach but there is no loss of generality by using it - in other words, using a more complex interface will not let you do some XQuery processing that cannot be done with this approach.

You can use the following code to access BaseX for example (in this case to a local server using default permissions). I am prefixing text to make the result string an XML document but your code may not need it:

package [...]; import java.io.*; import java.util.*; import org.basex.server.ClientSession;

public class QueryText {

public String doQuery(String xquery) {

ClientSession session;

String result;

ByteArrayOutputStream buffer = new ByteArrayOutputStream();

session=null;

result="NO RESULT";

try {

session = new ClientSession("localhost", 1984, "admin", "admin");

session.execute(xquery, buffer);

result="<?xml version='1.0' encoding='UTF-8'?>"+buffer.toString("UTF-8");

} catch(Exception e) {

...

} finally {

try {

session.close();

} catch(Exception e) { ... }

}

Attachments:

attachment.html (text/html — 4.7 KB)

Show replies by date

Godmar Back

23 Jul 23 Jul

9:38 p.m.

On Fri, Jul 23, 2010 at 8:31 PM, NewIntellectual newintellectual@gmail.com wrote:

...

That means that queries can be sent simply as text strings, and results of any size can also be returned in that way and then parsed as XML fragments in any number of myriad ways, including parsing to an in-memory DOM.

I understand that I can send XML in textual representation to the database and retrieve XML text. This will require constant serialization and deserialization of my XML representation, however, a cost I had been hoping to avoid. (*) In addition, it will be impossible to reuse xpath queries and update expressions.

Could the BaseX developers confirm that this is in fact the only robust way to interact with the database? I'm very surprised by this; to draw a comparison to relational databases, it's as if the only way to interact with the database is through its command line shell, and as if language bindings for SQL had never been invented. Is this, in fact, the state of the art in XML databases?

Suppose I believe that and change my application to interact with the database in purely textual form; how confident can I be that BaseX will then be able to perform operations such as inserting/removing nodes with multiple namespaces? At this point, I'm seriously considering the alternative of dropping the use of a database altogether and simple store my XML documents in a suitably designed file structure. (Perhaps combined with an external indexer to speed up full-text searches.)

- Godmar

(*) as pointed out earlier, for retrievals, I already have no choice but to serialize BaseX's DOM representation to text and then parse it into an EventTarget-capable representation, since BaseX's DOM implementation doesn't implement EventTarget. What you are saying, however, is that even parameters passed to xqueries/updates need to be serialized.

Christian Grün

24 Jul 24 Jul

8:55 a.m.

...

Could the BaseX developers confirm that this is in fact the only robust way to interact with the database? I'm very surprised by this; to draw a comparison to relational databases, it's as if the only way to interact with the database is through its command line shell, and as if language bindings for SQL had never been invented. Is this, in fact, the state of the art in XML databases?

It's too simple, and imprudent, to equate XML with the relational world. In fact, your use case isn't that obvious for everyone as you might believe. Many users have quite a different approach to XML data, which you'll note if you take the time to browse our archive, and the mailing lists of other XML databases and processors. Just as an example, many users don't require variables bindings as they code everything in XQuery (I've come across an XQuery expression used in product environment, which was 80 KB in size), or will dynamically compose the query which is to be evaluated. In short, there is in fact no single "state of the art" in XML, which will work for everyone – and there will never be just one.

Our objecctive is try to cover as many requirements as possible, and – again – XQJ is just one of them, which isn't requested that frequently as other features. A short while ago, we've asked on the mailing list which missing or incomplete BaseX features are requested most, and I'll soon publish a summary of the results back to list. To be honest, no one asked for better interfaces, which indicated to us that the existing framework seems to cover most requirements, at least for people following our mailing list. And this framework includes not just the command line shell, which you chose to mention, and which is just a frontend user interface, but:

– our language bindings for different programming languages (basex.org/api) – our REST interface, based on JAX-RX – our own API, (either embedded or client/server-based) – XML:DB and XQJ

Next, it is wrong to regard our XQJ implementation as unstable. In fact, some effort has been put into this API, and it covers 100% in the available test suite (it's included in the JSR 225 distribution and in our API package). Feedback like yours might help to make it more robust, though. Adding support for the EventTarget interface might be an interesting extension as well, but it's not a mandatory part of the standard.

After all, what you see and get here is Open Source – and, as your own surely know by yourself, successful open source projects benefit from, and are dependent on, external contributions. We'll soon offer commercial support, which might give you the chance to speed up the development of features you cannot afford to program by your own. As an alternative – which might be more reliable for the moment, but way more expensive as well – you still have the choice to go for one of the commercial alternatives, such as Mark Logic, Data Direct, xDB, etc.

...

(*) as pointed out earlier, for retrievals, I already have no choice [...]

I tried to point out in this mail that you rather have too many than too less choices; unfortunately, none is just free.

Have fun, Christian

Godmar Back

11:31 a.m.

On Sat, Jul 24, 2010 at 8:55 AM, Christian Grün christian.gruen@gmail.com wrote:

...

Our objecctive is try to cover as many requirements as possible, and – again – XQJ is just one of them, which isn't requested that frequently as other features. A short while ago, we've asked on the mailing list which missing or incomplete BaseX features are requested most, and I'll soon publish a summary of the results back to list. To be honest, no one asked for better interfaces, which indicated to us that the existing framework seems to cover most requirements, at least for people following our mailing list.

I don't care about XQJ as such. 6 months ago I didn't even know it exists. I care about the ability to easily and robustly interact with the database. I need to be able to write maintainable queries that contain placeholders ('variables') where application objects are inserted.

I claim that having this ability is a frequent requirement; or conversely, that applications that use an approach where large XQueries are cobbled together textually and very likely not robust. Take a look at the numerous SQL injection vulnerabilities that resulted from people cobbling together SQL code as just an example of why this approach is fragile.

...

And this framework includes not just the command line shell, which you chose to mention, and which is just a frontend user interface, but:

– our language bindings for different programming languages (basex.org/api) – our REST interface, based on JAX-RX – our own API, (either embedded or client/server-based) – XML:DB and XQJ

On basex.org/api there are links to the language bindings, which also seems entirely text-based. (For instance, https://svn.uni-konstanz.de/dbis/basex/trunk/api/etc/java/QueryExample.java )

Do you have documentation for your embedded or client-server based API?

Does it support the ability to pass DOM (or other Java) objects to to-be-executed XQuerys and to obtain results as Java objects?

...

Next, it is wrong to regard our XQJ implementation as unstable. In fact, some effort has been put into this API, and it covers 100% in the available test suite (it's included in the JSR 225 distribution and in our API package). Feedback like yours might help to make it more robust, though.

When doing Open Source, you'll learn that people base their judgment on what they need and try.

I can't really make a judgment about the stability of BaseX's XQJ implementation because I haven't gotten it working yet. For me, it failed at the first operation I tried. In my opinion, I wasn't stressing it with test cases such as the one I sent you, either. This means the existing testsuite is likely highly deficient (which in turn could be explained by the skepticism and abandonment XQJ appears to be facing, judging on your earlier pointers to discussions in the community.)

BTW, we already have some investment in BaseX. Our student spent 6 months getting a raw prototype of our application working, along with ca. 15 queries written using XQJ. Unfortunately, he was using the wrong namespaces and so didn't encounter these problems. So, my efforts here are to see if we can salvage our efforts or which alternative to chose.

It would probably take me a day to rewrite our XQJ code to use something else, but I first want to make sure I understand what the options are.

...

Adding support for the EventTarget interface might be an interesting extension as well, but it's not a mandatory part of the standard.

...
(*) as pointed out earlier, for retrievals, I already have no choice [...]

I tried to point out in this mail that you rather have too many than too less choices; unfortunately, none is just free.

Since I need support for EventTarget, I have no choice but to serialize the DOM object returned by BaseX's XQJ implementation and parse the resulting XML into a DOM object using the JDK's DOM implementation which supports the now 10-year-old DOM Level 2 Event Standard: http://www.w3.org/TR/2000/REC-DOM-Level-2-Events-20001113/

Is my understanding correct?

- Godmar

NewIntellectual

23 Jul 23 Jul

10:03 p.m.

I want to emphasize that I am in no way speaking for the BaseX development team - I'm simply making remarks as a satisfied BaseX user. I would certainly like to see a potentially more efficient interface working, but I think merely assuming that the text-in/text-out approach is slow, is unwarranted - I suggest doing actual experimentation with your use cases and judging based on the results.

I do think it's throwing out the baby with the bathwater to want to "go with your own filesystem perhaps with an external text indexer." BaseX shines at optimizing queries using attribute and standards-defined XQuery Fulltext operators. You need only look at the nightmare of configuring and using eXist, which uses Lucene as an external text indexer, to compare how good BaseX is in that regard. Also, I have used the XQuery Update facility - also standards-defined - in BaseX to good effect on some serious projects that would have taken much longer to do with conventional Java programming.

Note that XQJ (JSR 225) itself is constantly spoken of as "currently under development." That said, clearly Dr. Grun and others of his development team are clearly interested in fixing problems - much more quickly than most open source projects, in my observation. But, if one wanted to assist in the development of a part of the system that one personally needs, I suggest that helping them shore up the XQJ interface would yield far larger gains than trying to hack up your own XML database system. It is, afterall, an open source project.

Phil Oliver

Godmar Back

10:45 p.m.

On Fri, Jul 23, 2010 at 10:03 PM, NewIntellectual newintellectual@gmail.com wrote:

...

I want to emphasize that I am in no way speaking for the BaseX development team - I'm simply making remarks as a satisfied BaseX user. I would certainly like to see a potentially more efficient interface working, but I think merely assuming that the text-in/text-out approach is slow, is unwarranted - I suggest doing actual experimentation with your use cases and judging based on the results.

Sure, intuition can be misleading. If the database is large, queries and results small, the time probably won't matter since it's dominated by the operations on the database itself. My objection is only partly based on expected performance, I'm also concerned about the awkwardness and (relatively) large potential for errors when cobbling together textual XML (which we'll at least partly have to do if we won't be able to use XQJ's bindNode facility.)

...

I do think it's throwing out the baby with the bathwater to want to "go with your own filesystem perhaps with an external text indexer." BaseX shines at optimizing queries using attribute and standards-defined XQuery Fulltext operators. You need only look at the nightmare of configuring and using eXist, which uses Lucene as an external text indexer, to compare how good BaseX is in that regard. Also, I have used the XQuery Update facility - also standards-defined - in BaseX to good effect on some serious projects that would have taken much longer to do with conventional Java programming. Note that XQJ (JSR 225) itself is constantly spoken of as "currently under development." That said, clearly Dr. Grun and others of his development team are clearly interested in fixing problems - much more quickly than most open source projects, in my observation. But, if one wanted to assist in the development of a part of the system that one personally needs, I suggest that helping them shore up the XQJ interface would yield far larger gains than trying to hack up your own XML database system. It is, afterall, an open source project.

I'm still trying to find out where the mature and where the "under development" parts are.

Is it correct to assume that the XQuery and XQuery Update facility (both?) are mature, but the XQJ binding is not; and that moreover, there is doubt in the community that XQJ will ever have wide adoption and stable implementations (if I interpret Christian's earlier pointers, including at an email by Per Bothner, correctly?)

- Godmar

Godmar Back

24 Jul 24 Jul

12:21 a.m.

On Fri, Jul 23, 2010 at 8:31 PM, NewIntellectual newintellectual@gmail.com wrote:

...

session = new ClientSession("localhost", 1984, "admin", "admin");

session.execute(xquery, buffer);

result="<?xml version='1.0' encoding='UTF-8'?>"+buffer.toString("UTF-8");

ps: about your suggestion of using 'ClientSession.execute' to interact with the database.

I am thinking about this approach, but it becomes quickly clear that this will require us to reimplement an XQJ-like facility that allows us to formulate XQuery Updates in a maintainable way such that variables can be expressed (for variables now declared as 'external').

The other question is how errors are communicated. 'execute' returns a boolean and writes data into a buffer. Will I have to parse the returned buffer for error messages?

- Godmar

Andreas Weiler

2:51 a.m.

...

The other question is how errors are communicated. 'execute' returns a boolean and writes data into a buffer. Will I have to parse the returned buffer for error messages?

An exception is thrown when an error occurs, like in this example:

// Run a buggy query try { session.execute("XQUERY ///"); } catch(final BaseXException ex) { System.out.println(ex.getMessage()); }

- Andreas

Godmar Back schrieb:

...

On Fri, Jul 23, 2010 at 8:31 PM, NewIntellectual newintellectual@gmail.com wrote:

...
session = new ClientSession("localhost", 1984, "admin", "admin");

session.execute(xquery, buffer);

result="<?xml version='1.0' encoding='UTF-8'?>"+buffer.toString("UTF-8");

ps: about your suggestion of using 'ClientSession.execute' to interact with the database.

I am thinking about this approach, but it becomes quickly clear that this will require us to reimplement an XQJ-like facility that allows us to formulate XQuery Updates in a maintainable way such that variables can be expressed (for variables now declared as 'external').

The other question is how errors are communicated. 'execute' returns a boolean and writes data into a buffer. Will I have to parse the returned buffer for error messages?

Godmar

BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

5473

Age (days ago)

5473

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

7 comments

4 participants

tags (0)

participants (4)

Andreas Weiler
Christian Grün
Godmar Back
NewIntellectual