Beginner Qs concerning scope and focus of BaseX

List overview All Threads
Download

newer

older

Re: [basex-talk] Beginner Qs...

Re: [basex-talk] Linking BaseX to...

Silver, Jonathan

7 Mar 2013 7 Mar '13

3:34 p.m.

1. The YouTube intro video has no sound. [2] http:/ / www. youtube. com/ watch?v=xILHKGPGaJ4& hd=1 I've tried it with different browsers and different players.

2. Documents are added to a database using add, with a file pathname (or url..) is it the same pathname that is always Used to refer to that document? Or can I reference it another way? I created a database, added 3 documents using the Add command. In list, no input path is displayed, how would I request the second document?

3. If I had a different versions of the factbook xml file (maybe one each year) and so the contents like populations or even languages

Change each year, can I and how would I use baseX to manage these?

I'd want to do normal queries but I'd also want to do queries for specific years

And do queries for what has changed - or changed the most?

I'm reading up on XQuery, but some of this looks beond XQuery and required DB support.

Thanks, jonathan

Attachments:

attachment.html (text/html — 6.8 KB)

Show replies by date

Johannes.Lichtenberger

7 Mar 7 Mar

4:51 p.m.

On 03/07/2013 09:34 PM, Silver, Jonathan wrote:

...

If I had a different versions of the factbook xml file (maybe one each year) and so the contents like populations or even languages

Change each year, can I and how would I use baseX to manage these?

I'd want to do normal queries but I'd also want to do queries for specific years

Interesting. Which functions or whatever would you require? I'm very interested how this could be simply achieved, because if versioning takes place at the database/storage level you have to add these timestamps somewhere to open the revision which is nearest to a specified timestamp very efficiently (for instance through a kind of binary search). Usually, the transactional time timestamp is stored for a version/revision, but in your case it might be great if you can specify whatever date you want and persist it in a simple datastructure (instead of just the timestamp of when a certain version has been commited).

Some customers of for instance relational database systems are working with a relation which just stores the actual version and another relation which stores the history of the first relation (that is at first the data is copied into the history relation after someone edited it and commited the new version). Might simply be translated to XML vocabulary :-)

However, that's far from ideal!

...

And do queries for what has changed - or changed the most?

...

I'm reading up on XQuery, but some of this looks beond XQuery and required DB support.

Sadly I think there's no ideal solution, at least nothing "production ready" which works beyond simply storing the full "content" of all versions, at least not without huge workarounds.

Disclaimer: I've worked on a versioned open source XML storage system in the course of my studies at the University of Konstanz and in my spare time during the last years. However, besides having implemented sophisticated path rewriting rules (BTW: BaseX once again served as a great inspiration -- in fact I suppose from having read Christian's Thesis it implements the exact same rules ;-)) it still lacks index rewriting rules and thus might not be as fast as you would like it to be (plus a simple pointer to previous page-versions might increase speed even more).

Full ACID support currently isn't supported, that is a database might be in an inconsistent state after a power failure and so on. However this should be very easy to implement now, together with checkpointing.

It's interesting and I think in the future it might be a very valuable tool as storage costs decreases, SSDs getting even cheaper (our COW-approach is best suited for fast random reads/huge amounts of sequential writes) and thus storing versions (depending on versioning algorithms -- trade-off between write- and read-performance) is even more valuable. However, I think it must implement versioning algorithms or at least support the simple copy-on-write incremental storage of a database page. Our solution even just stores page-deltas. Storing the full content of all versions otherwise should be a huge performance bottleneck.

Probably very cool is the fact that you are able to dig down into the internal tree-representation of the XML-documents and search for particular segments in several versions/revisions with simple XPath-axis extensions as for instance: first::, last::, next::, previous::, future::, future-or-self::, past::, past-or-self::, all-time::, that is navigating in time ...

BTW: We (and that is I am ;-)) support(ing) diff-algorithms (for instance a diff-algorithm to import the differences between a stored version and a new version in the form of an XML-document (but might also be a JSON-resource in the future) and in a complementary step either include an index (to store which nodes have been changed -- not implemented) or use a subsequent ID-based fast diff-algorithm (optionally using hashes) to compare any version with any other version. However I currently have no idea how to incorporate a diff-algorithm in a query.

The GUI is far from perfect, but ideally you are able to visually compare 2 or even more tree-structures/versions thereof. However as I've encountered several issues with the processing-library (AWT inside Swing -- heavyweight vs. lightweight issues) best would be to reimplement the views itself in JavaFX and use Open Dolphin[1] for a client/server communication or find a way how to resize the PApplet (AWT) component appropriately once the window size changes.

But well, it's the BaseX mailinglist and I've already written way too much ;-)

kind regards Johannes

[1] https://github.com/canoo/open-dolphin

4516

Age (days ago)

4516

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

1 comments

2 participants

tags (0)

participants (2)

Johannes.Lichtenberger
Silver, Jonathan