Hi Feargal,
Just my two cents, but to stress the fact what Christian is saying: BaseX is an XML database (albeit the clever marketing guys at BaseX now branded it as "BaseX Framework" with the new webpage ;-) ), so of course it actually loads XML files into the database itself.
I am wondering why you want this evaluation: 12k documents sounds like... not much. Are these documents particularly large? Otherwise I would simple start with BaseX and put them all into the database and query the data. If your documents are not particularly huge that should be reasonably fast and you can basically evaluate this in ten minutes for yourself.
Also, I would like to add that BaseX (hence: A framework) is also a powerful XQuery processor. So if you want to "enhancve the XML with regex patterns" it sound technically inferior and also it makes sad pandas cry :( Why you should not use regex to parse XML, you ask? I kindly refer you to this excellent SO answer: https://stackoverflow.com/a/1732454/1451599
Cheers Dirk
Senacor Technologies Aktiengesellschaft - Sitz: Eschborn - Amtsgericht Frankfurt am Main - Reg.-Nr.: HRB 105546 Vorstand: Matthias Tomann, Marcus Purzer - Aufsichtsratsvorsitzender: Daniel Grözinger
On 19. Apr 2018, at 23:06, Christian Grün christian.gruen@gmail.com wrote:
Hi Feargal,
I noticed that baseX doesn't seem to actually load xml files into an xml database, is that right?
If you have a larger number of XML documents, and if the documents need to be processed multiple times, you will usually store them in a database. But it's generally possible with BaseX to process files without storing them in a database. But I would assume that this is possible with eXist-db as well.
I don't know who is maintaining the vschart.com web site, but I was wondering which information was misleading?
And what happens when a file is edited/updated?
Do you refer to the original file or a document in the database? If the original file is updated, it will need to be readded to your database.
Does baseX need to be 'told' that it has been updated, in order to add the new data to its indeices? Or does it know there has been an update and automatically reindex?
For more information on indexes in BaseX, I invite you to visit the corresponding article in our documentation [1], in particular the section on updates.
Thanks
Welcome, Christian