Hello everyone
I am trying to load BaseX with a large number of XML files (~500), each one a few hundreds of MBs big. BaseX fails with a message along the lines "This is too big for one database".
Can I please ask:
1) Are there any logs, beyond the DB logs? If yes, where can I find them?
a. The reason I am asking is because once basexgui gives the message, there is no indication about the error. Ideally, I would like to know if this is a limitation on memory amount or number of items (?).
2) The parser options include reading XML files from archives, which is very convenient, but once the file has been parsed, does BaseX require the "originals" for queries / returning results?
3) Is it possible to do federation with BaseX? In other words, let's say I split a database in two large parts (as per #1), is it possible to launch two baseX servers and then have them talk to each other so that ultimately I just query one of them and get back unified results?
All the best
I am sorry, turns out the error is probably due to malformed input in one of the files which I will have to look into, not BaseX, would however still appreciate some indication regarding the rest of the questions.
All the best
From: Anastasiou A. Sent: 12 September 2017 09:54 To: basex-talk@mailman.uni-konstanz.de Subject: A few general questions about BaseX
Hello everyone
I am trying to load BaseX with a large number of XML files (~500), each one a few hundreds of MBs big. BaseX fails with a message along the lines "This is too big for one database".
Can I please ask:
1) Are there any logs, beyond the DB logs? If yes, where can I find them?
a. The reason I am asking is because once basexgui gives the message, there is no indication about the error. Ideally, I would like to know if this is a limitation on memory amount or number of items (?).
2) The parser options include reading XML files from archives, which is very convenient, but once the file has been parsed, does BaseX require the "originals" for queries / returning results?
3) Is it possible to do federation with BaseX? In other words, let's say I split a database in two large parts (as per #1), is it possible to launch two baseX servers and then have them talk to each other so that ultimately I just query one of them and get back unified results?
All the best
Hi Anastasiou,
When adding many big documents, I usually set the ADDCACHE option [1], and add files sequentially (for example in a BaseX command script). So, when I hit the db size limit, no work is lost and I can continue adding the remaining files in a new db.
Best regards, Fabrice Etanchaud
[1] http://docs.basex.org/wiki/Options#ADDCACHE
De : basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] De la part de Anastasiou A. Envoyé : mardi 12 septembre 2017 11:01 À : 'basex-talk@mailman.uni-konstanz.de' Objet : [basex-talk] FW: A few general questions about BaseX
I am sorry, turns out the error is probably due to malformed input in one of the files which I will have to look into, not BaseX, would however still appreciate some indication regarding the rest of the questions.
All the best
From: Anastasiou A. Sent: 12 September 2017 09:54 To: basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de Subject: A few general questions about BaseX
Hello everyone
I am trying to load BaseX with a large number of XML files (~500), each one a few hundreds of MBs big. BaseX fails with a message along the lines "This is too big for one database".
Can I please ask:
1) Are there any logs, beyond the DB logs? If yes, where can I find them?
a. The reason I am asking is because once basexgui gives the message, there is no indication about the error. Ideally, I would like to know if this is a limitation on memory amount or number of items (?).
2) The parser options include reading XML files from archives, which is very convenient, but once the file has been parsed, does BaseX require the "originals" for queries / returning results?
3) Is it possible to do federation with BaseX? In other words, let's say I split a database in two large parts (as per #1), is it possible to launch two baseX servers and then have them talk to each other so that ultimately I just query one of them and get back unified results?
All the best
Hi Anastasiou,
:-)
Am 12.09.2017 um 10:54 schrieb Anastasiou A. a.anastasiou@swansea.ac.uk:
Hello everyone
I am trying to load BaseX with a large number of XML files (~500), each one a few hundreds of MBs big. BaseX fails with a message along the lines “This is too big for one database”.
=> might be that you hit one of our database limits. See the first line here for general limits: http://docs.basex.org/wiki/Statistics http://docs.basex.org/wiki/Statistics
Can I please ask:
Are there any logs, beyond the DB logs? If yes, where can I find them?
a. The reason I am asking is because once basexgui gives the message, there is no indication about the error. Ideally, I would like to know if this is a limitation on memory amount or number of items (?).
The database logs are all there is. You can use the admin module to add your own messages to these logs, http://docs.basex.org/wiki/Admin_Module#admin:write-log http://docs.basex.org/wiki/Admin_Module#admin:write-log
So for example in a try / catch block you could log more explicitly whats happening :)
The parser options include reading XML files from archives, which is very convenient, but once the file has been
parsed, does BaseX require the “originals” for queries / returning results?
If you created a database from your archive, i.e. db:create(‚foo.zip‘,…) the original files are no longer needed. http://docs.basex.org/wiki/Database_Module#db:create http://docs.basex.org/wiki/Database_Module#db:create
Is it possible to do federation with BaseX? In other words, let’s say I split a database in two large parts (as per #1),
is it possible to launch two baseX servers and then have them talk to each other so that ultimately I just query one of them and get back unified results?
Yes, it is possible, but only if you are willing to suffer a little :-) You can use the http://docs.basex.org/wiki/Client_Module http://docs.basex.org/wiki/Client_Module to connect to a running basex server and you could use the http://docs.basex.org/wiki/XQuery_Module#xquery:fork-join http://docs.basex.org/wiki/XQuery_Module#xquery:fork-join to connect to multiple servers at once, pose a query and return the results.
But expect to do some stuff by hand: i.e.:
- maintain a list of servers you currently have - handle storing, deleting and updating documents - handle document retrieval, …
So a probably large bit of manual work is absolutely required, but depending on what you need to do with your documents it might work "well enough“ :-)
If you need more information let me know!
Best Michael
All the best
Hi Anastasiou,
Hopefully some of these answers are somewhat helpful.
On Tue, Sep 12, 2017 at 4:54 AM, Anastasiou A. a.anastasiou@swansea.ac.uk wrote:
Hello everyone
I am trying to load BaseX with a large number of XML files (~500), each one a few hundreds of MBs big.
BaseX fails with a message along the lines “This is too big for one database”.
Can I please ask:
Are there any logs, beyond the DB logs? If yes, where can I find
them?
a. The reason I am asking is because once basexgui gives the message, there is no indication about the error. Ideally, I would like to know if this is a limitation on memory amount or number of items (?).
I'm not sure how to enable more verbose logging with the GUI -- hopefully
one of the devs or power users can weigh in on this.
The parser options include reading XML files from archives, which
is very convenient, but once the file has been parsed, does BaseX require the “originals” for queries / returning results?
AFAIK, no it does not. BaseX will query and return results from the
internal database(s).
Is it possible to do federation with BaseX? In other words, let’s
say I split a database in two large parts (as per #1), is it possible to launch two baseX servers and then have them talk to each other so that ultimately I just query one of them and get back unified results?
AFAIK, the preferred method is to split your files across many databases, then query multiple databases from a single expression[1]. Others will be able to speak to this better, but I don't think there's a straightforward way to run multiple BaseX servers in a single JVM.
All the best
Best, Bridger
Hi Anastasiou (and thanks Bridger),
BaseX fails with a message along the lines “This is too big for one database”.
Are there any logs, beyond the DB logs? If yes, where can I find
them?
I'm not sure how to enable more verbose logging with the GUI -- hopefully one of the devs or power users can weigh in on this.
You can enable the debugging mode, e.g. by entering "set debug true" in the GUI command input panel on top. If the returned feedback does not help, it would be interested in the exact error message you get (because there may be several reasons why the input is too large for a single database).
The parser options include reading XML files from archives, which
is very convenient, but once the file has been parsed, does BaseX require the “originals” for queries / returning results?
AFAIK, no it does not. BaseX will query and return results from the internal database(s).
Exactly!
Is it possible to do federation with BaseX? In other words, let’s
say I split a database in two large parts (as per #1), is it possible to launch two baseX servers and then have them talk to each other so that ultimately I just query one of them and get back unified results?
AFAIK, the preferred method is to split your files across many databases, then query multiple databases from a single expression[1]. Others will be able to speak to this better, but I don't think there's a straightforward way to run multiple BaseX servers in a single JVM.
Exactly!
basex-talk@mailman.uni-konstanz.de