Re: [basex-talk] Disable or control query caching

21 May 2012


      Hello again,
...
I implemented this and it looks like it works nicely (to be confirmed
soon  - I started a run on a 600k records collection).
This runs nicely, in that the machine doesn't run out of memory
anymore. There is one thing I noticed however, and that I had noticed
earlier on as well when a big collection was being processed: any
attempt to talk with the server seems not to be working, i.e. even
when I try to connect via the command-line basexadmin and run a
command such as "list" or "open db foo", I do not get a reply. I can
see the commands in the log though:
17:28:06.532	[127.0.0.1:33112]	LOGIN admin	OK
17:28:08.158	[127.0.0.1:33112]	LIST
17:28:21.288	[127.0.0.1:33114]	LOGIN admin	OK
17:28:25.602	[127.0.0.1:33114]	LIST
17:28:52.676	[127.0.0.1:33116]	LOGIN admin	OK
Could it be that the long session is blocking the output stream coming
from the server?
Thanks,
Manuel
On Mon, May 21, 2012 at 4:40 PM, Manuel Bernhardt
bernhardt.manuel@gmail.com wrote:
...
Hi Christian,
...
as you have already seen, all results are first cached by the client
if they are requested via the iterative query protocol. In earlier
versions of BaseX, results were returned in a purely iterative manner
-- which was more convenient and flexible from a user's point of view,
but led to numerous deadlocks if reading and writing queries were
mixed.
If you only need parts of the requested results, I would recommend to
limit the number of results via XQuery, e.g. as follows:
( for $i in /record[@version = > 0]
 order by $i/system/index
 return $i) [position() = 1 to 1000]
I had considered this, but haven't used that approach - yet - mainly
because I wanted to try the streaming approach first. So far our
system only used MongoDB and we are used to working with cursors as
query results, so I'm trying to keep that somehow aligned if possible.
...
Next, it is important to note that the "order by" clause can get very
expensive, as all results have to be cached anyway before they can be
returned. Our top-k functions will probably give you better results if
it's possible in your use case to limit the number of results [1].
Ok, thanks. If this becomes a problem, I'll consider using this. Is
the query time of 0.06ms otherwise the actual time the query takes to
run? If yes then I'm not too worried about query performance :)
In general, the bottleneck in our system is not so much the querying
but rather the processing of the records - I started rewriting this
one concurrently using Akka, but am now stuck with a classloader
deadlock (no pun intended). It will likely take quite some effort for
the processing to be faster than the query iteration.
...
A popular alternative to client-side caching (well, you mentioned that
already) is to overwrite the code of the query client, and directly
process the returned results. Note, however, that you need to loop
through all results, even if you only need parts of the results.
I implemented this and it looks like it works nicely (to be confirmed
soon  - I started a run on a 600k records collection).
Thanks for your time!
Manuel
...
Hope this helps,
Christian
[1] http://docs.basex.org/wiki/Higher-Order_Functions_Module#hof:top-k-by

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Disable or control query caching