Hello all,
We have an issue with performance in basex.
We are using basex to manage some user generated content, which has frequent updates. Each user has his own "home" within one large xml document, and they are able to update content within their own home.
Our problem is that as our number of users increased, we started to notice some major slow-downs on our server. A small update of the user's content (one or two xml nodes) sometimes results in our [flush-...] process writing about 30MB of data. I hope this makes sense.
I haven't actually looked at the basex source code, so I don't know what happens during these updates, but it seems to me that the amount of data written to the disk per update is relative to our database (document) size, although the amount of data being updated remains constant. Perhaps there is an index update happening? Sorry if I'm talking nonsense because I don't really understand how basex works.
So basically, I have several questions:
1) Is it a bad idea to have one single large document, which can have many frequent updates? 2) Would it be better to have a document for each user? This would result in about 1000 documents being created per day. Does basex index documents within a database, for quick retreival, and is there a limit on the number of documents that you can have in a single database? 3) if we switch to one-document-per-user, would updating these documents somehow still result in large amounts of data being written per update, even if the user's documents are very small? Or would this be a possible solution to our problem?
Thank you and I hope this made sense, - Adrian
Adrian,
thanks for your mail. It is difficult to give general advise on how you should structure your data to get them as performant as possible, as there are many factors, such as the type of queries or the heterogeneity of the data that influences the decision. But it's a fact that updates may take longer on larger database instances due to the document order of XML documents (a property that is irrelevant in relational databases), as the resulting storage that needs to takes care of updating the child/parent relationships. In general, it is cheaper to change data at the end of a document/database, which is e.g the "insert before" statement is more expensive than using "insert into". Regarding the maximum amounts of document nodes per database, the limit will rather be encountered if too many XML nodes are stored; you'll find some examples in our Wiki [1].
I hope my answer wasn't too fuzzy, but we'd probably have to spend more time investing your particular use case to give concrete help.
Christian
[1] http://docs.basex.org/wiki/Statistics ___________________________
On Sun, Jan 22, 2012 at 1:47 PM, Adrian Hossu adibadi@gmail.com wrote:
Hello all,
We have an issue with performance in basex.
We are using basex to manage some user generated content, which has frequent updates. Each user has his own "home" within one large xml document, and they are able to update content within their own home.
Our problem is that as our number of users increased, we started to notice some major slow-downs on our server. A small update of the user's content (one or two xml nodes) sometimes results in our [flush-...] process writing about 30MB of data. I hope this makes sense.
I haven't actually looked at the basex source code, so I don't know what happens during these updates, but it seems to me that the amount of data written to the disk per update is relative to our database (document) size, although the amount of data being updated remains constant. Perhaps there is an index update happening? Sorry if I'm talking nonsense because I don't really understand how basex works.
So basically, I have several questions:
- Is it a bad idea to have one single large document, which can have many
frequent updates? 2) Would it be better to have a document for each user? This would result in about 1000 documents being created per day. Does basex index documents within a database, for quick retreival, and is there a limit on the number of documents that you can have in a single database? 3) if we switch to one-document-per-user, would updating these documents somehow still result in large amounts of data being written per update, even if the user's documents are very small? Or would this be a possible solution to our problem?
Thank you and I hope this made sense,
- Adrian
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de