baseX vs ExistDB

List overview All Threads
Download

newer

older

RFC2, Client for R

Re: [basex-talk] BaseX-Talk...

Feargal Hogan

18 Apr 2018 18 Apr '18

9:39 a.m.

Is anyone aware of any comparisons between baseX and Exist? I have some familiarity with Exist and I’d like o understand what are the benefits of each.

Thanks

Feargal

Attachments:

attachment.html (text/html — 1.2 KB)

Show replies by date

Alexander Holupirek

18 Apr 18 Apr

10:34 a.m.

...

On 18. Apr 2018, at 15:39, Feargal Hogan feargal.hogan@gmail.com wrote:

Hi

Is anyone aware of any comparisons between baseX and Exist? I have some familiarity with Exist and I’d like o understand what are the benefits of each.

Thanks

Feargal

Both are, of course, excellent systems. Do you have something special in mind that you would like to compare? Besides, I'm not aware of a general feature comparison site or something like that.

Cheers, Alex

Ben Engbers

12:20 p.m.

Hi,

Look at http://vschart.com/compare/basex/vs/exist-db

If you want, you can add other comparisons

Cheers, Ben

Op 18-04-18 om 16:34 schreef Alexander Holupirek:

...

...
On 18. Apr 2018, at 15:39, Feargal Hogan feargal.hogan@gmail.com wrote:

Hi

Is anyone aware of any comparisons between baseX and Exist? I have some familiarity with Exist and I’d like o understand what are the benefits of each.

Thanks

Feargal

Both are, of course, excellent systems. Do you have something special in mind that you would like to compare? Besides, I'm not aware of a general feature comparison site or something like that.

Cheers, Alex

Liam R. E. Quin

4:12 p.m.

On Wed, 2018-04-18 at 14:39 +0100, Feargal Hogan wrote:

...

Hi

Is anyone aware of any comparisons between baseX and Exist? I have some familiarity with Exist and I’d like o understand what are the benefits of each.

I don't know of any recent ones that are in-depth, and both products have changed - eXist especially i think matured, but you'll be aware of that end :) - so look carefully at the date on any you find.

What really matters is suitability to task, though, and that will depend on what you're trying to do. And part of suitability to task is the support network - are other people doing similar thigns using eXist-db or BaseX?

Liam

-- Liam Quin, W3C, http://www.w3.org/People/Quin/ Staff contact for Verifiable Claims WG, SVG WG, XQuery WG Improving Web Advertising: https://www.w3.org/community/web-adv/ Personal: awesome vintage art: http://www.fromoldbooks.org/

Feargal Hogan

19 Apr 19 Apr

11:26 a.m.

...

On 18 Apr 2018, at 21:12, Liam R. E. Quin liam@w3.org wrote:

On Wed, 2018-04-18 at 14:39 +0100, Feargal Hogan wrote:

...
Hi

Is anyone aware of any comparisons between baseX and Exist? I have some familiarity with Exist and I’d like o understand what are the benefits of each.

What really matters is suitability to task, though, and that will depend on what you're trying to do. And part of suitability to task is the support network - are other people doing similar thigns using eXist-db or BaseX?

Liam

Hmmm, havent seen anyone doing what I am looking to do.

Initially, I want to replace filesystem storage for about 12k xml files with queryable storage.

As we progress, I may want to batch update mutiple records contextually and/or enhance the xml based on regex patterns.

From the comparison chart that Ben referenced earlier I noticed that baseX doesn’t seem to actually load xml files into an xml database, is that right? So what does it do then? It creates a queryable indexed representation of the files? Is that right?

And what happens when a file is edited/updated?

Does baseX need to be 'told' that it has been updated, in order to add the new data to its indeices? Or does it know there has been an update and automatically reindex?

Thanks

Feargal

Christian Grün

5:06 p.m.

Hi Feargal,

...

I noticed that baseX doesn’t seem to actually load xml files into an xml database, is that right?

If you have a larger number of XML documents, and if the documents need to be processed multiple times, you will usually store them in a database. But it’s generally possible with BaseX to process files without storing them in a database. But I would assume that this is possible with eXist-db as well.

I don’t know who is maintaining the vschart.com web site, but I was wondering which information was misleading?

...

And what happens when a file is edited/updated?

Do you refer to the original file or a document in the database? If the original file is updated, it will need to be readded to your database.

...

Does baseX need to be 'told' that it has been updated, in order to add the new data to its indeices? Or does it know there has been an update and automatically reindex?

For more information on indexes in BaseX, I invite you to visit the corresponding article in our documentation [1], in particular the section on updates.

...

Thanks

Welcome, Christian

[1] http://docs.basex.org/wiki/Indexes

Kirsten, Dirk

20 Apr 20 Apr

3:30 a.m.

Hi Feargal,

Just my two cents, but to stress the fact what Christian is saying: BaseX is an XML database (albeit the clever marketing guys at BaseX now branded it as "BaseX Framework" with the new webpage ;-) ), so of course it actually loads XML files into the database itself.

I am wondering why you want this evaluation: 12k documents sounds like... not much. Are these documents particularly large? Otherwise I would simple start with BaseX and put them all into the database and query the data. If your documents are not particularly huge that should be reasonably fast and you can basically evaluate this in ten minutes for yourself.

Also, I would like to add that BaseX (hence: A framework) is also a powerful XQuery processor. So if you want to "enhancve the XML with regex patterns" it sound technically inferior and also it makes sad pandas cry :( Why you should not use regex to parse XML, you ask? I kindly refer you to this excellent SO answer: https://stackoverflow.com/a/1732454/1451599

Cheers Dirk

Senacor Technologies Aktiengesellschaft - Sitz: Eschborn - Amtsgericht Frankfurt am Main - Reg.-Nr.: HRB 105546 Vorstand: Matthias Tomann, Marcus Purzer - Aufsichtsratsvorsitzender: Daniel Grözinger

...

On 19. Apr 2018, at 23:06, Christian Grün christian.gruen@gmail.com wrote:

Hi Feargal,

...
I noticed that baseX doesn't seem to actually load xml files into an xml database, is that right?

If you have a larger number of XML documents, and if the documents need to be processed multiple times, you will usually store them in a database. But it's generally possible with BaseX to process files without storing them in a database. But I would assume that this is possible with eXist-db as well.

I don't know who is maintaining the vschart.com web site, but I was wondering which information was misleading?

...
And what happens when a file is edited/updated?

Do you refer to the original file or a document in the database? If the original file is updated, it will need to be readded to your database.

...
Does baseX need to be 'told' that it has been updated, in order to add the new data to its indeices? Or does it know there has been an update and automatically reindex?

For more information on indexes in BaseX, I invite you to visit the corresponding article in our documentation [1], in particular the section on updates.

...
Thanks

Welcome, Christian

[1] http://docs.basex.org/wiki/Indexes

Feargal Hogan

7:11 a.m.

...

Hi Feargal,

Just my two cents, but to stress the fact what Christian is saying: BaseX is an XML database (albeit the clever marketing guys at BaseX now branded it as “BaseX Framework” with the new webpage ;-) ), so of course it actually loads XML files into the database itself.

I am wondering why you want this evaluation: 12k documents sounds like… not much. Are these documents particularly large? Otherwise I would simple start with BaseX and put them all into the database and query the data. If your documents are not particularly huge that should be reasonably fast and you can basically evaluate this in ten minutes for yourself.

Also, I would like to add that BaseX (hence: A framework) is also a powerful XQuery processor. So if you want to “enhancve the XML with regex patterns” it sound technically inferior and also it makes sad pandas cry :( Why you should not use regex to parse XML, you ask? I kindly refer you to this excellent SO answer: https://stackoverflow.com/a/1732454/1451599 https://stackoverflow.com/a/1732454/1451599

Cheers Dirk

Hi Dirk - thanks for this I primarily use XSLT for transformations and the regex are all inside the xslt files. So really the regex processing is being used to parse highly regular PCDATA instances into xml tags

For instance, there are/were lots of textual instances of geolocation text such as “Lat. 48º 51’ N, Long. 034º 54’ E” and regex is perfect for converting those to geoxml tags.

I would never try to parse XML with regex.

Its interesting to hear you say that 12k docs isn’t a lot of data and in byte terms it is not.

But I want to ensure I get it into the database in a meaningful structure.

I have had a couple of false starts with ExistDB, particularly in relation to RESTful interfaces, so I am just a little cautious.

I will do some testing now as it seems clearer to me what the product is about. Thanks Feargal

Liam R. E. Quin

4:03 a.m.

On Thu, 2018-04-19 at 16:26 +0100, Feargal Hogan wrote:

...

...
From the comparison chart that Ben referenced earlier I noticed that baseX doesn’t seem to actually load xml files into an xml database, is that right?

No. Yes. Maybe.

baseX does load the documents into a database. It stores them in an internal data structure, not as textual XML.

...

It creates a queryable indexed representation of the files? Is that right?

Yes.

...

And what happens when a file is edited/updated?

A file outside the database? Nothing.

Depending on database options, though, if you update a document in the db, the index is updated.

...

Does baseX need to be 'told' that it has been updated, in order to add the new data to its indeices? Or does it know there has been an update and automatically reindex?

This isn't a meaningful quesiton.

If you load a CSV file into a database such as Oracle, what happens if the CSV file changes on disk outside Oracle? And why do you care? You would normally edit the data at that point in the database using a SQL application.

BaseX doesn't need to consult the external XML files once the database is built (although yes, you _can_ keep files on disk and refer to them if you want, but then you're somewhat fighting the system and will have to go through some hoops to have super-fast queries).

As Christian and Dirk said, go give BaseX a try as many of your questions will be answered in some number of nanoseconds :) In particular, you can create a database from the GUI -- 12,000 files may take a few seconds to index, depending on how large they are -- and run queries directly.

One note on BaseX wth documents - it has an option to delete whitespace nodes on import, which, inappropriately for documents, is enabled by default. You'll find it in the Options tab when you make a database from the GUI, for example.

Liam

Omar Siam

7:59 a.m.

Hi all!

I am dealing with 730 XML files with about 2.5 GB in total size right now for some months in BaseX. I'm happy to share the knowledge I gathered. We also tried exist-db on the same set of XML data and couldn't do any updates anymore in a reasonable amount of time.

The most positve aspects of BaseX in my scenario are

* It is easy to understand what BaseX is doing and when

* If you like you can manage your updates in a very granular way in parallel using jobs. This can speed up things quit a lot.

* You may be able to devide your XML into multiple BaseX databases in one instance and then access and update them without having locking problems and with speed.

* You decide if and when you recreate indices after updates.

The downside is

* If you start doing things in parallel you can run into all sorts of locking and memory management problems. Memory can also be an issue if you do updates all over the place in a single run because then the update log can get really big. Also of course you can make your development system stall because you use up all the CPU ;-)

* BaseX in comparison to exist-db turned out to be particularly bad at hosting multiple XQuery based applications like RestXQ endpoint in one instance. It is really easy to have a global (write) lock. Then things get stuck.

* BaseX is not as smart on recognizing when indices can be used in longer XQuery code. exist-db is definitly better at that.

If one keeps it simple and one project per BaseX instance then it is much easier to know what actually happens compared to exist-db and that is a big asset for me.

Best regards Omar Siam

ACDH-OeAW

Am 19.04.2018 um 17:26 schrieb Feargal Hogan:

...

...
On 18 Apr 2018, at 21:12, Liam R. E. Quin liam@w3.org wrote:

On Wed, 2018-04-18 at 14:39 +0100, Feargal Hogan wrote:

...
Hi

Is anyone aware of any comparisons between baseX and Exist? I have some familiarity with Exist and I’d like o understand what are the benefits of each.

What really matters is suitability to task, though, and that will depend on what you're trying to do. And part of suitability to task is the support network - are other people doing similar thigns using eXist-db or BaseX?

Liam

Hmmm, havent seen anyone doing what I am looking to do.

Initially, I want to replace filesystem storage for about 12k xml files with queryable storage.

As we progress, I may want to batch update mutiple records contextually and/or enhance the xml based on regex patterns.

From the comparison chart that Ben referenced earlier I noticed that baseX doesn’t seem to actually load xml files into an xml database, is that right? So what does it do then? It creates a queryable indexed representation of the files? Is that right?

And what happens when a file is edited/updated?

Does baseX need to be 'told' that it has been updated, in order to add the new data to its indeices? Or does it know there has been an update and automatically reindex?

Thanks

Feargal

Christian Grün

9:49 a.m.

Hi Omar,

Thank you (and everyone else) for sharing your experiences.

...

BaseX in comparison to exist-db turned out to be particularly bad at

hosting multiple XQuery based applications like RestXQ endpoint in one instance.

Definitely true; BaseX was not built for that. If you want to run multiple applications with a single web server, the recommended approach is to use the WAR distributions of BaseX and deploy each application as a separate servlet.

...

BaseX is not as smart on recognizing when indices can be used in longer

XQuery code.

This one is interesting to hear, because we observed that users chose BaseX in the past exactly because of the index rewritings. Did you encounter these restrictions when working with multiple databases, or also with single instances?

Cheers, Christian

Marco Lettere

9:57 a.m.

On 20/04/2018 15:49, Christian Grün wrote:

...

Hi Omar,

Thank you (and everyone else) for sharing your experiences.

...

BaseX in comparison to exist-db turned out to be particularly bad at

hosting multiple XQuery based applications like RestXQ endpoint in one instance.

Definitely true; BaseX was not built for that. If you want to run multiple applications with a single web server, the recommended approach is to use the WAR distributions of BaseX and deploy each application as a separate servlet.

Or use it as we do here. A new BaseX process for every application. A sort of very small application container or nano-service with the optimization that they all share the BaseX code and just use different restxq and possibly data folders.

;-)

Omar Siam

10:02 a.m.

I use BaseX'es jobs for this. Works but you have to be careful because you give up all the protections against dead locking or becoming unresponsive because no new jobs can be scheduled anymore so all HTTP communication stops like database administration for example.

Best regards

Omar

Am 20.04.2018 um 15:57 schrieb Marco Lettere:

...

On 20/04/2018 15:49, Christian Grün wrote:

...
Hi Omar,

Thank you (and everyone else) for sharing your experiences.

...

BaseX in comparison to exist-db turned out to be particularly bad at

hosting multiple XQuery based applications like RestXQ endpoint in one instance.

Definitely true; BaseX was not built for that. If you want to run multiple applications with a single web server, the recommended approach is to use the WAR distributions of BaseX and deploy each application as a separate servlet.

Or use it as we do here. A new BaseX process for every application. A sort of very small application container or nano-service with the optimization that they all share the BaseX code and just use different restxq and possibly data folders.

;-)

M.

Omar Siam

9:58 a.m.

Hi Christian!

Am 20.04.2018 um 15:49 schrieb Christian Grün:

...

...

BaseX is not as smart on recognizing when indices can be used in longer

XQuery code.

This one is interesting to hear, because we observed that users chose BaseX in the past exactly because of the index rewritings. Did you encounter these restrictions when working with multiple databases, or also with single instances?

This is in comparison to exist-db which finds indexes in whatever collection they are configured deep down in serveral hundrets of lines of XQuery module code where the collection name is calculated somwhere else. You know from some previous exchange we had that this is not the strong suite of BaseX. But that is a feature where exist-db is exceptionally good. Just when writing code structured like it was good for exist-db I didn't get any rewrites and so no acceleration from BaseX indexes at all. When rewriting queries to contain the db name as string literal that is a different story. This is also needed for get out of global locking. Best regards Omar

ACDH-OeAW

Christian Grün

11:06 a.m.

Hi Omar,

Right, we had some discussion on this. If I remember correctly, you had been dynamically addressing databases in BaseX, whereas in eXist-db (as far as I remember), all queries and rewritten path expressions will refer to one static and globally opened database instance, right?

I guess your setup doesn’t allow this, but in use cases where the names of the databases are statically known, a popular approach is to reference the databases in global variables and address them inside the query. Here is a simple example for a nested query that will take advantage of the index:

declare variable $DB := db:open('factbook'); declare function local:city($city) { local:countries()//city[name = $city] }; declare function local:countries() { $DB//country }; local:city('Rome')

You can also open a global database (e.g. via basexhttp and the -c option [1]) and enable DEFAULTDB [2], but in this case, you’ll always need to use collection() or doc() before each path expression:

declare function local:city($city) { collection()//city[name = $city] }; declare function local:countries() { collection()//country }; local:city('Rome')

Hope this helps, Christian

[1] http://docs.basex.org/wiki/Command-Line_Options#HTTP_Server [2] http://docs.basex.org/wiki/Options#DEFAULTDB

2646

Age (days ago)

2648

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

14 comments

8 participants

tags (0)

participants (8)

Alexander Holupirek
Ben Engbers
Christian Grün
Feargal Hogan
Kirsten, Dirk
Liam R. E. Quin
Marco Lettere
Omar Siam