Hello Hans-Juergen,
So my understanding is that the messages are inserted as child elements into this root element - and the end result is one document with one root element and millions of child elements representing the invidual messages, yes?
Yes that is correct, i have one root element at the beginning and insert the incoming items as child nodes of the root.
Therefore you do not have to come up with URIs, as there is only one single document. A monster document, but I conclude from your approach that this is no problem, and not worse (or even better) than having a million individual, small documents. Is it correct - would you recommend to store the messages in one single document?
In my use case, tweets have unique id attributes, so i don't need any URIs to identify them. Probably, it is a good idea if you describe your further querying process so it is easier to understand what you want to do.
If the loading process cannot concur with queries - would there be any way how one could periodically "shift" packages of messages into a "read only" database? Or perhaps better the other way around, let the server periodically interrupt its loading activity, close the database, rename it, open and initialize a new base and then continue to load? Or is there presently simply no solution available?
Thats exactly what i do after each hour. I rename the current db with the current date_hour and create a new database for the next incoming items. Shifting is not really an alternative, cause it will probably take too long to insert the items into a second database and delete them from the "main" database.
Kind regards, Andreas
Am 03.07.2012 um 23:58 schrieb Hans-Juergen Rennau:
Hello Andreas,
thank you very much for these informations! Indeed, the use-cases are similar.
I try to understand how exactly you stored the messages. The Wiki says: "the initial database just contained a root node <tweets/>". So my understanding is that the messages are inserted as child elements into this root element - and the end result is one document with one root element and millions of child elements representing the invidual messages, yes? Therefore you do not have to come up with URIs, as there is only one single document. A monster document, but I conclude from your approach that this is no problem, and not worse (or even better) than having a million individual, small documents. Is it correct - would you recommend to store the messages in one single document?
If the loading process cannot concur with queries - would there be any way how one could periodically "shift" packages of messages into a "read only" database? Or perhaps better the other way around, let the server periodically interrupt its loading activity, close the database, rename it, open and initialize a new base and then continue to load? Or is there presently simply no solution available?
Kind regards, Hans-Juergen
Von: Andreas Weiler andreas.weiler@uni-konstanz.de An: Hans-Juergen Rennau hrennau@yahoo.de CC: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de Gesendet: 15:51 Dienstag, 3.Juli 2012 Betreff: Re: [basex-talk] BaseX as a log msg store?
Hello Hans-Juergen,
here are some details about my use case, which is similar to yours. I'm using BaseX to insert the live public Twitter Stream into databases (see Wiki Entry [1]).
One Twitter message is around 4 kb of size and i'm able to insert about 2000 of them per second using single XQuery Update inserts. So that would probably be working out for you, too. If you use bulk inserts, like caching the items in a item list and running one XQuery Update for all of them, the amount of inserts would also increase.
thus made available for querying
this could be a bigger problem, cause as long as you are writing items into the database (which will never stop in your use case), the readers are blocked. And if one of your readers will be running, the writers are blocked.
Hope this helps, Andreas