Hello,
we're getting an apparent deadlock (followed by a GC overhead limit exceeded) on one machine, when starting some processing on a collection of over 800 000 records. Going after it with YourKit yields the following:
application-akka.actor.default-dispatcher-110 <--- Frozen for at least 3m 7s org.basex.server.Query.cache(InputStream) org.basex.server.ClientQuery.cache() org.basex.server.Query.more() eu.delving.basex.client.Implicits$RichClientQuery.hasNext() scala.collection.Iterator$$anon$19.hasNext() scala.collection.Iterator$$anon$29.hasNext() scala.collection.Iterator$class.foreach(Iterator, Function1) scala.collection.Iterator$$anon$29.foreach(Function1) core.processing.CollectionProcessor$$anonfun$process$2.apply(ClientSession) core.processing.CollectionProcessor$$anonfun$process$2.apply(Object) core.storage.BaseXStorage$$anonfun$withSession$1.apply(ClientSession) core.storage.BaseXStorage$$anonfun$withSession$1.apply(Object) eu.delving.basex.client.BaseX$$anonfun$withSession$1.apply(ClientSession) eu.delving.basex.client.BaseX$$anonfun$withSession$1.apply(Object) eu.delving.basex.client.BaseX.withSession(Function1) eu.delving.basex.client.BaseX.withSession(String, Function1) core.storage.BaseXStorage$.withSession(Collection, Function1) core.processing.CollectionProcessor.process(Function0, Function1, Function1, Function3) core.processing.DataSetCollectionProcessor$.process(DataSet) actors.Processor$$anonfun$receive$1.apply(Object)<2 recursive calls> akka.actor.Actor$class.apply(Actor, Object) actors.Processor.apply(Object) akka.actor.ActorCell.invoke(Envelope) akka.dispatch.Mailbox.processMailbox(int, long) akka.dispatch.Mailbox.run() akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec() akka.jsr166y.ForkJoinTask.doExec() akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinTask) akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool$WorkQueue) akka.jsr166y.ForkJoinWorkerThread.run()
In the server logs I can observe:
09:10:45.129 [192.168.1.214:47530]: dimcon____geheugen-van-nederland QUERY(3) for $i in /record[@version = 0] order by $i/system/index return $i OK 0.06 ms 09:10:45.129 [192.168.1.214:47530]: dimcon____geheugen-van-nederland QUERY(3) OK 0.03 ms 09:13:23.155 [192.168.1.214:47530]: dimcon____geheugen-van-nederland ITER(3) Error: Connection reset 09:13:23.155 [192.168.1.214:47530]: dimcon____geheugen-van-nederland LOGOUT admin OK
I looked up the code, and it looks as though the whole (?) query is cached in memory upon retrieval. Given the DB is over 1.2 GB in size, our client server has a hard time (it only has 1,5 GB of Xmx).
Is there any preferred way of dealing with this?
What I am going to do for the moment, I think, is to override or intercept the creation of the ClientQuery and make an implementation that has a different caching strategy. Another approach may probably be to limit the query output and implement some custom iteration behavior - but if it that can be handled directly at the query level, I think it would make things easier.
Manuel