All: I'm running an long "xquery delete node" on one of my database and this pretty much seem to make my server unavailable for other operations. I expect this query to run for a few hours (affecting a few million nodes, database is around 800Mb). I'm running this on a Amazon EC2 small instance (1 CPU, 1.7Gb RAM) and the CPU has been stuck at 100%. If I attempt to query any database on the server (not only the one being updated), I basically don't get anything back. So do heavy xquery update requests basically lock down the server? Anyone else having similar experience? Feedback/suggestions? best *P
Hi Pascal,
this behavior is part of the BaseX transaction management. As XQuery Update allows to alter multiple databases within the same query/transaction, updating statements are executed in isolation - which means no other transactions are allowed during evaluation of an updating transaction.
You can find more about this here [1]. But, depending on your use case, there might be a workaround (i.e. performing updates on a copy, etc...).
Cheers, Lukas
[1] http://docs.basex.org/wiki/Transaction_Management
On Fri, Nov 4, 2011 at 3:39 AM, Pascal Heus pascal.heus@gmail.com wrote:
All: I'm running an long "xquery delete node" on one of my database and this pretty much seem to make my server unavailable for other operations. I expect this query to run for a few hours (affecting a few million nodes, database is around 800Mb). I'm running this on a Amazon EC2 small instance (1 CPU, 1.7Gb RAM) and the CPU has been stuck at 100%. If I attempt to query any database on the server (not only the one being updated), I basically don't get anything back. So do heavy xquery update requests basically lock down the server? Anyone else having similar experience? Feedback/suggestions? best *P _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Lukas: Thanks for the feedback. This is pretty nasty though as it makes bulk updates pretty much unpractical in a multi-user environment.
My current query was something like: delete node /codeBook/dataDscr/var/catstat which basically deletes all the catstat element in teh database at once
Would this be one transaction or many transactions? for $var in /codeBook/dataDscr/var delete node $var/catstat
It may stilll push my CPU to 100% though. Would there be a way to set a priority for queries? For example, I may want my regular search/retrieval queries to run at higher priority than the above update so users still get served in a reasonable amount of time?
best *P
On 11/4/11 7:25 AM, Lukas Kircher wrote:
Hi Pascal,
this behavior is part of the BaseX transaction management. As XQuery Update allows to alter multiple databases within the same query/transaction, updating statements are executed in isolation - which means no other transactions are allowed during evaluation of an updating transaction.
You can find more about this here [1]. But, depending on your use case, there might be a workaround (i.e. performing updates on a copy, etc...).
Cheers, Lukas
[1] http://docs.basex.org/wiki/Transaction_Management
On Fri, Nov 4, 2011 at 3:39 AM, Pascal Heus <pascal.heus@gmail.com mailto:pascal.heus@gmail.com> wrote:
All: I'm running an long "xquery delete node" on one of my database and this pretty much seem to make my server unavailable for other operations. I expect this query to run for a few hours (affecting a few million nodes, database is around 800Mb). I'm running this on a Amazon EC2 small instance (1 CPU, 1.7Gb RAM) and the CPU has been stuck at 100%. If I attempt to query any database on the server (not only the one being updated), I basically don't get anything back. So do heavy xquery update requests basically lock down the server? Anyone else having similar experience? Feedback/suggestions? best *P _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de <mailto:BaseX-Talk@mailman.uni-konstanz.de> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Pascal,
Would this be one transaction or many transactions?
for $var in /codeBook/dataDscr/var delete node $var/catstat
We treat each query as an individual transaction, so this won't have any impact.
Would there be a way to set a priority for queries?
There's no customizable prioritization mechanism. At the moment, queries are treated first-come-first-serve to overcome starvation issues.
As XQuery allows for a wide range of operations on multiple databases within the same query, we haven't yet come up with a solution that allows true concurrent access and at the same time doesn't interfere with the efficient and compact nature of the BaseX backend.
Cheers, Lukas
Hi Pascal,
I can give you some promising prospects: yesterday, we did some brainstorming how to best extend the BaseX architecture to concurrent readers and writers (possibly by switching to MVCC), and concurrent write operations on multiple datbases. Work is in progress, but I'm positive that we'll soon be able to offer a solution that gives you (and everyone) more freedom for many updating use cases. In the short run, one feasible approach is to perform updates in one BaseX and database instance, and then replace the obsolete database instance with the updated one.
We'll keep you updated, Christian ___________________________
On Fri, Nov 4, 2011 at 3:16 PM, Pascal Heus pascal.heus@gmail.com wrote:
Lukas: Thanks for the feedback. This is pretty nasty though as it makes bulk updates pretty much unpractical in a multi-user environment.
My current query was something like: delete node /codeBook/dataDscr/var/catstat which basically deletes all the catstat element in teh database at once
Would this be one transaction or many transactions? for $var in /codeBook/dataDscr/var delete node $var/catstat
It may stilll push my CPU to 100% though. Would there be a way to set a priority for queries? For example, I may want my regular search/retrieval queries to run at higher priority than the above update so users still get served in a reasonable amount of time?
best *P
Hi Pascal,
to your problem: Not sure if it would help (as I do not know your use case), but maybe you could start lots of queries only deleting few elements each time:
delete node (/codeBook/dataDscr/var/catstat)[position() <= 10]
If you do not need a consistent database during your delete-"transaction" (as it is none), your read-queries will be able to run in-between.
Deleting only 10 `//city`-elements in factbook took me 20ms, deleting all of them 4000ms. As there was a total of 3147 elements this increased time needed by a factor of about 1.5.
Kind regards from Lake Constance, Jens
basex-talk@mailman.uni-konstanz.de