Hi,
I am having difficulties with populating BASEX database. I have plenty of XML files (~ half a million, with various sizes ranging from several kilobytes up to hundred of kilobytes).
I use BASEX Java API and finally I call for each file org.basex.core.cmd.Add.
I am trying to import them into BASEX database, in fact there are 22 types of files (22 XSD definitions) the files conform to, so I have 22 different databases in a single BASEX server.
I have plenty of RAM and CPU power and I monitor the process (both -- the BASEX server and my client program) from within JVisualVM, the JVM reaches the CPU boundaries, but RAM is never exhausted.
Before importing, I need to enhance the XML data with some additional information taken from SQL database.
I have written a Groovy multithreaded program that uses BASEX Java API with heavy use of GPars library. Simply put, the program:
1. has several producer threads -- each producer reads given portion of the database and provides those additional information
2. has several consumer threads -- each consumer takes the original files, wraps it with additional information and finally calls org.basex.core.cmd.Add command.
Various testing with less data (upto ~ several thousands of files) provides good results -- no loss of data, BASEX server and my client program behaves as it should.
Unfortunately when trying to import all of the files, the program starts fine, but when it gets "warm" I got SIGPIPE errors in log from time to time (as I said, there is plenty of RAM and CPU available) (see attachment please).
Comments to picture:
1. I am adding document with ID ISPOP_166007 -- this ID is indeed missing in the final database
2. just simple call to Add:
Closure add = { session -> def cmd = new org.basex.core.cmd.Add(dsn, enhancedXml) session.execute(cmd) }
3. I am reusing the session, the session is bound to current thread and never gets closed until the thread (consumer) finishes
There is nothing wrong in BASEX server log, other documents are added just fine, there is no trace about document ISPOP_166007.
Just for reference the complete stack trace follows:
- - - -
ERROR basex.support.AddResourcesSupport - unable to consume ISPOP_166007 java.net.SocketException: Roura přerušena (SIGPIPE) at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_45] at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_45] at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_45] at org.basex.io.out.BufferOutput.flush(BufferOutput.java:60) ~[basex-8.2.jar!/:8.2] at org.basex.io.out.BufferOutput.write(BufferOutput.java:54) ~[basex-8.2.jar!/:8.2] at org.basex.io.out.PrintOutput.write(PrintOutput.java:66) ~[basex-8.2.jar!/:8.2] at java.io.OutputStream.write(OutputStream.java:116) ~[na:1.8.0_45] at java.io.OutputStream.write(OutputStream.java:75) ~[na:1.8.0_45] at org.basex.api.client.ClientSession.send(ClientSession.java:238) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.ClientSession.execute(ClientSession.java:160) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.ClientSession.execute(ClientSession.java:167) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.Session.execute(Session.java:36) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.Session$execute.call(Unknown Source) ~[na:na] at basex.support.AddResourcesSupport$_consume_closure9$_closure17.doCall(AddResourcesSupport.groovy:255) ~[basex-1.0.jar!/:na] at sun.reflect.GeneratedMethodAccessor368.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45] at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45] at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) [groovy-2.4.4.jar!/:2.4.4] at basex.BasexSessionRegistry.withThreadBoundSession(BasexSessionRegistry.groovy:79) ~[basex-1.0.jar!/:na] at basex.BasexSessionRegistry$withThreadBoundSession$0.call(Unknown Source) ~[na:na] at basex.support.AddResourcesSupport$_consume_closure9.doCall(AddResourcesSupport.groovy:257) ~[basex-1.0.jar!/:na] at sun.reflect.GeneratedMethodAccessor327.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45] at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45] at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.BooleanReturningMethodInvoker.invoke(BooleanReturningMethodInvoker.java:51) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.BooleanClosureWrapper.call(BooleanClosureWrapper.java:53) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.DefaultGroovyMethods.find(DefaultGroovyMethods.java:3908) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.dgm$191.invoke(Unknown Source) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) [groovy-2.4.4.jar!/:2.4.4] at basex.support.AddResourcesSupport.consume(AddResourcesSupport.groovy:251) ~[basex-1.0.jar!/:na]
- - - -
Best Regards, Martin
Hi Martin,
I must confess I didn't check all the details in your mail, and I haven't stumbled across something like SIGPIPE errors before, but I would be interested to hear if the problem als occurs...
a) with a single thread, or b) if you don't reuse existing sessions?
Thanks in advance, Christian
On Wed, Aug 5, 2015 at 10:08 AM, Martin mar@centrum.cz wrote:
Hi,
I am having difficulties with populating BASEX database. I have plenty of XML files (~ half a million, with various sizes ranging from several kilobytes up to hundred of kilobytes).
I use BASEX Java API and finally I call for each file org.basex.core.cmd.Add.
I am trying to import them into BASEX database, in fact there are 22 types of files (22 XSD definitions) the files conform to, so I have 22 different databases in a single BASEX server.
I have plenty of RAM and CPU power and I monitor the process (both -- the BASEX server and my client program) from within JVisualVM, the JVM reaches the CPU boundaries, but RAM is never exhausted.
Before importing, I need to enhance the XML data with some additional information taken from SQL database.
I have written a Groovy multithreaded program that uses BASEX Java API with heavy use of GPars library. Simply put, the program:
- has several producer threads -- each producer reads given portion of the
database and provides those additional information
- has several consumer threads -- each consumer takes the original files,
wraps it with additional information and finally calls org.basex.core.cmd.Add command.
Various testing with less data (upto ~ several thousands of files) provides good results -- no loss of data, BASEX server and my client program behaves as it should.
Unfortunately when trying to import all of the files, the program starts fine, but when it gets "warm" I got SIGPIPE errors in log from time to time (as I said, there is plenty of RAM and CPU available) (see attachment please).
Comments to picture:
- I am adding document with ID ISPOP_166007 -- this ID is indeed missing
in the final database
just simple call to Add:
Closure add = { session -> def cmd = new org.basex.core.cmd.Add(dsn, enhancedXml) session.execute(cmd) }
I am reusing the session, the session is bound to current thread and
never gets closed until the thread (consumer) finishes
There is nothing wrong in BASEX server log, other documents are added just fine, there is no trace about document ISPOP_166007.
Just for reference the complete stack trace follows:
ERROR basex.support.AddResourcesSupport - unable to consume ISPOP_166007 java.net.SocketException: Roura přerušena (SIGPIPE) at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_45] at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_45] at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_45] at org.basex.io.out.BufferOutput.flush(BufferOutput.java:60) ~[basex-8.2.jar!/:8.2] at org.basex.io.out.BufferOutput.write(BufferOutput.java:54) ~[basex-8.2.jar!/:8.2] at org.basex.io.out.PrintOutput.write(PrintOutput.java:66) ~[basex-8.2.jar!/:8.2] at java.io.OutputStream.write(OutputStream.java:116) ~[na:1.8.0_45] at java.io.OutputStream.write(OutputStream.java:75) ~[na:1.8.0_45] at org.basex.api.client.ClientSession.send(ClientSession.java:238) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.ClientSession.execute(ClientSession.java:160) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.ClientSession.execute(ClientSession.java:167) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.Session.execute(Session.java:36) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.Session$execute.call(Unknown Source) ~[na:na] at basex.support.AddResourcesSupport$_consume_closure9$_closure17.doCall(AddResourcesSupport.groovy:255) ~[basex-1.0.jar!/:na] at sun.reflect.GeneratedMethodAccessor368.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45] at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45] at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) [groovy-2.4.4.jar!/:2.4.4] at basex.BasexSessionRegistry.withThreadBoundSession(BasexSessionRegistry.groovy:79) ~[basex-1.0.jar!/:na] at basex.BasexSessionRegistry$withThreadBoundSession$0.call(Unknown Source) ~[na:na] at basex.support.AddResourcesSupport$_consume_closure9.doCall(AddResourcesSupport.groovy:257) ~[basex-1.0.jar!/:na] at sun.reflect.GeneratedMethodAccessor327.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45] at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45] at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.BooleanReturningMethodInvoker.invoke(BooleanReturningMethodInvoker.java:51) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.BooleanClosureWrapper.call(BooleanClosureWrapper.java:53) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.DefaultGroovyMethods.find(DefaultGroovyMethods.java:3908) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.dgm$191.invoke(Unknown Source) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) [groovy-2.4.4.jar!/:2.4.4] at basex.support.AddResourcesSupport.consume(AddResourcesSupport.groovy:251) ~[basex-1.0.jar!/:na]
Best Regards, Martin
Hi,
I will indeed try to investigate it a little more.
Reason for being multithreaded producer-consumer si simple -- single thread solution is _very_ slow, even now it takes hours to complete.
Now I am not reusing sessions -- each org.basex.core.cmd.Add command opens and closes its own session, still running multithreaded. So far it seems to be behaving better (still running).
According to JVM monitoring the thread count of BASEX server is steady ~ 60 live threads. In previous scenario (multithreaded + reusing thread-bound sessions) it peaked to ~ 600 or more live threads -- so reusing sessions is probably a bad idea.
Is it a bug in BASEX (not releasing resources properly?) or is reusing sessions (saved in thread local context) a bad idea and should be discouraged?
m.
On Wed, Aug 05, 2015 at 10:29:34AM +0200, Christian Grün wrote:
Hi Martin,
I must confess I didn't check all the details in your mail, and I haven't stumbled across something like SIGPIPE errors before, but I would be interested to hear if the problem als occurs...
a) with a single thread, or b) if you don't reuse existing sessions?
Thanks in advance, Christian
On Wed, Aug 5, 2015 at 10:08 AM, Martin mar@centrum.cz wrote:
Hi,
I am having difficulties with populating BASEX database. I have plenty of XML files (~ half a million, with various sizes ranging from several kilobytes up to hundred of kilobytes).
I use BASEX Java API and finally I call for each file org.basex.core.cmd.Add.
I am trying to import them into BASEX database, in fact there are 22 types of files (22 XSD definitions) the files conform to, so I have 22 different databases in a single BASEX server.
I have plenty of RAM and CPU power and I monitor the process (both -- the BASEX server and my client program) from within JVisualVM, the JVM reaches the CPU boundaries, but RAM is never exhausted.
Before importing, I need to enhance the XML data with some additional information taken from SQL database.
I have written a Groovy multithreaded program that uses BASEX Java API with heavy use of GPars library. Simply put, the program:
- has several producer threads -- each producer reads given portion of the
database and provides those additional information
- has several consumer threads -- each consumer takes the original files,
wraps it with additional information and finally calls org.basex.core.cmd.Add command.
Various testing with less data (upto ~ several thousands of files) provides good results -- no loss of data, BASEX server and my client program behaves as it should.
Unfortunately when trying to import all of the files, the program starts fine, but when it gets "warm" I got SIGPIPE errors in log from time to time (as I said, there is plenty of RAM and CPU available) (see attachment please).
Comments to picture:
- I am adding document with ID ISPOP_166007 -- this ID is indeed missing
in the final database
just simple call to Add:
Closure add = { session -> def cmd = new org.basex.core.cmd.Add(dsn, enhancedXml) session.execute(cmd) }
I am reusing the session, the session is bound to current thread and
never gets closed until the thread (consumer) finishes
There is nothing wrong in BASEX server log, other documents are added just fine, there is no trace about document ISPOP_166007.
Just for reference the complete stack trace follows:
ERROR basex.support.AddResourcesSupport - unable to consume ISPOP_166007 java.net.SocketException: Roura přerušena (SIGPIPE) at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_45] at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_45] at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_45] at org.basex.io.out.BufferOutput.flush(BufferOutput.java:60) ~[basex-8.2.jar!/:8.2] at org.basex.io.out.BufferOutput.write(BufferOutput.java:54) ~[basex-8.2.jar!/:8.2] at org.basex.io.out.PrintOutput.write(PrintOutput.java:66) ~[basex-8.2.jar!/:8.2] at java.io.OutputStream.write(OutputStream.java:116) ~[na:1.8.0_45] at java.io.OutputStream.write(OutputStream.java:75) ~[na:1.8.0_45] at org.basex.api.client.ClientSession.send(ClientSession.java:238) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.ClientSession.execute(ClientSession.java:160) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.ClientSession.execute(ClientSession.java:167) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.Session.execute(Session.java:36) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.Session$execute.call(Unknown Source) ~[na:na] at basex.support.AddResourcesSupport$_consume_closure9$_closure17.doCall(AddResourcesSupport.groovy:255) ~[basex-1.0.jar!/:na] at sun.reflect.GeneratedMethodAccessor368.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45] at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45] at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) [groovy-2.4.4.jar!/:2.4.4] at basex.BasexSessionRegistry.withThreadBoundSession(BasexSessionRegistry.groovy:79) ~[basex-1.0.jar!/:na] at basex.BasexSessionRegistry$withThreadBoundSession$0.call(Unknown Source) ~[na:na] at basex.support.AddResourcesSupport$_consume_closure9.doCall(AddResourcesSupport.groovy:257) ~[basex-1.0.jar!/:na] at sun.reflect.GeneratedMethodAccessor327.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45] at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45] at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.BooleanReturningMethodInvoker.invoke(BooleanReturningMethodInvoker.java:51) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.BooleanClosureWrapper.call(BooleanClosureWrapper.java:53) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.DefaultGroovyMethods.find(DefaultGroovyMethods.java:3908) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.dgm$191.invoke(Unknown Source) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) [groovy-2.4.4.jar!/:2.4.4] at basex.support.AddResourcesSupport.consume(AddResourcesSupport.groovy:251) ~[basex-1.0.jar!/:na]
Best Regards, Martin
Reason for being multithreaded producer-consumer si simple -- single thread solution is _very_ slow, even now it takes hours to complete.
I see. So just in case it would be interested if the problems also occur with a single thread.
Is it a bug in BASEX (not releasing resources properly?) or is reusing sessions (saved in thread local context) a bad idea and should be discouraged?
I'm not aware of a particular bug in BaseX related to session management, but usually there is no need to reuse existing sessions, as the creation of sessions is pretty light-weight.
Hi,
I'm not really familiar with the formulation of the xquery standard. Should code of this type depending on the short-circuit evaluation of the and operator evaluate correctly?
declare variable $version:=1; declare variable $outputdb:='test'; declare variable $outputpath:='test'; db:exists($outputdb,$outputpath) and doc(concat($outputdb,'/',$outputpath))/node()/@version = $version
If the file test/test doesn't exist and receive an error even if the evaluation of the condition is possible. If this is normal does it exist a way to write this type of code without use of if?
Pablo Strasser
Hi Pablo,
Should code of this type depending on the short-circuit evaluation of the and operator evaluate correctly?
Yes, it does. Here is a mini example (it would raise an error otherwise):
1 or error()
Hope this helps, Christian
declare variable
$version:=1; declare variable $outputdb:='test'; declare variable $outputpath:='test'; db:exists($outputdb,$outputpath) and doc(concat($outputdb,'/',$outputpath))/node()/@version = $version
If the file test/test doesn't exist and receive an error even if the evaluation of the condition is possible. If this is normal does it exist a way to write this type of code without use of if?
Pablo Strasser
Hi Christian,
Thanks for the answer.
It seem that if there is a call to the doc function it doesn't short-circuit.
Your exemple work, however this one does not: 1 or doc('non_existent')
Which return [FODC0002] Resource '/home/pablo/nonex' does not exist.
With respect of your answer I believe this behaviour is a bug.
Pablo
On 05/08/15 18:05, Christian Grün wrote:
Hi Pablo,
Should code of this type depending on the short-circuit evaluation of the and operator evaluate correctly?
Yes, it does. Here is a mini example (it would raise an error otherwise):
1 or error()
Hope this helps, Christian
On 05/08/15 19:59, Strasser Pablo wrote:
Hi Christian,
Thanks for the answer.
It seem that if there is a call to the doc function it doesn't short-circuit.
Your exemple work, however this one does not: 1 or doc('non_existent')
However this one work: declare %basex:lazy variable $doc:=doc('non_existent'); 1 or $doc
Pablo
Hi Pablo,
You are right; your expression raised an error, because the existence of documents is checked at compile-time (to allow for subsequent index rewritings, non-existing element names, etc.).
I realized that the behavior you observed was compliant with the spec, which does actually not dictate an evaluation order [1]. However, as we are already replacing compile-time errors with an fn:error() function in other expressions – such as if/then/else – and further operands are skipped once the final result is known, it was just consistent to do the same with and/or.
I have revised the code and uploaded a new snapshot [2]. Thanks for your report, Christian
[1] http://www.w3.org/TR/xquery-31/#id-logical-expressions [2] http://files.basex.org/releases/latest/
On Wed, Aug 5, 2015 at 7:59 PM, Strasser Pablo strasserpablo@bluewin.ch wrote:
Hi Christian,
Thanks for the answer.
It seem that if there is a call to the doc function it doesn't short-circuit.
Your exemple work, however this one does not: 1 or doc('non_existent')
Which return [FODC0002] Resource '/home/pablo/nonex' does not exist.
With respect of your answer I believe this behaviour is a bug.
Pablo
On 05/08/15 18:05, Christian Grün wrote:
Hi Pablo,
Should code of this type depending on the short-circuit evaluation of the and operator evaluate correctly?
Yes, it does. Here is a mini example (it would raise an error otherwise):
1 or error()
Hope this helps, Christian
Reusing sessions within the same thread resulted in many live threads at BASEX server -- and probably in SIGPIPE errors after a while.
Using new session for every Add command solved the problem. Opening and closing a session (as measured) is acceptable performance penalty.
Regards, Martin
On Wed, Aug 05, 2015 at 10:08:05AM +0200, Martin wrote:
Hi,
I am having difficulties with populating BASEX database. I have plenty of XML files (~ half a million, with various sizes ranging from several kilobytes up to hundred of kilobytes).
I use BASEX Java API and finally I call for each file org.basex.core.cmd.Add.
I am trying to import them into BASEX database, in fact there are 22 types of files (22 XSD definitions) the files conform to, so I have 22 different databases in a single BASEX server.
I have plenty of RAM and CPU power and I monitor the process (both -- the BASEX server and my client program) from within JVisualVM, the JVM reaches the CPU boundaries, but RAM is never exhausted.
Before importing, I need to enhance the XML data with some additional information taken from SQL database.
I have written a Groovy multithreaded program that uses BASEX Java API with heavy use of GPars library. Simply put, the program:
- has several producer threads -- each producer reads given portion of the
database and provides those additional information
- has several consumer threads -- each consumer takes the original files,
wraps it with additional information and finally calls org.basex.core.cmd.Add command.
Various testing with less data (upto ~ several thousands of files) provides good results -- no loss of data, BASEX server and my client program behaves as it should.
Unfortunately when trying to import all of the files, the program starts fine, but when it gets "warm" I got SIGPIPE errors in log from time to time (as I said, there is plenty of RAM and CPU available) (see attachment please).
Comments to picture:
- I am adding document with ID ISPOP_166007 -- this ID is indeed missing
in the final database
just simple call to Add:
Closure add = { session -> def cmd = new org.basex.core.cmd.Add(dsn, enhancedXml) session.execute(cmd) }
I am reusing the session, the session is bound to current thread and
never gets closed until the thread (consumer) finishes
There is nothing wrong in BASEX server log, other documents are added just fine, there is no trace about document ISPOP_166007.
Just for reference the complete stack trace follows:
ERROR basex.support.AddResourcesSupport - unable to consume ISPOP_166007 java.net.SocketException: Roura přerušena (SIGPIPE) at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_45] at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_45] at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_45] at org.basex.io.out.BufferOutput.flush(BufferOutput.java:60) ~[basex-8.2.jar!/:8.2] at org.basex.io.out.BufferOutput.write(BufferOutput.java:54) ~[basex-8.2.jar!/:8.2] at org.basex.io.out.PrintOutput.write(PrintOutput.java:66) ~[basex-8.2.jar!/:8.2] at java.io.OutputStream.write(OutputStream.java:116) ~[na:1.8.0_45] at java.io.OutputStream.write(OutputStream.java:75) ~[na:1.8.0_45] at org.basex.api.client.ClientSession.send(ClientSession.java:238) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.ClientSession.execute(ClientSession.java:160) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.ClientSession.execute(ClientSession.java:167) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.Session.execute(Session.java:36) ~[basex-8.2.jar!/:8.2] at org.basex.api.client.Session$execute.call(Unknown Source) ~[na:na] at basex.support.AddResourcesSupport$_consume_closure9$_closure17.doCall(AddResourcesSupport.groovy:255) ~[basex-1.0.jar!/:na] at sun.reflect.GeneratedMethodAccessor368.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45] at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45] at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) [groovy-2.4.4.jar!/:2.4.4] at basex.BasexSessionRegistry.withThreadBoundSession(BasexSessionRegistry.groovy:79) ~[basex-1.0.jar!/:na] at basex.BasexSessionRegistry$withThreadBoundSession$0.call(Unknown Source) ~[na:na] at basex.support.AddResourcesSupport$_consume_closure9.doCall(AddResourcesSupport.groovy:257) ~[basex-1.0.jar!/:na] at sun.reflect.GeneratedMethodAccessor327.invoke(Unknown Source) ~[na:na] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45] at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45] at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294) [groovy-2.4.4.jar!/:2.4.4] at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1019) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42) [groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.BooleanReturningMethodInvoker.invoke(BooleanReturningMethodInvoker.java:51) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.BooleanClosureWrapper.call(BooleanClosureWrapper.java:53) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.DefaultGroovyMethods.find(DefaultGroovyMethods.java:3908) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.dgm$191.invoke(Unknown Source) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:274) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:56) ~[groovy-2.4.4.jar!/:2.4.4] at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) [groovy-2.4.4.jar!/:2.4.4] at basex.support.AddResourcesSupport.consume(AddResourcesSupport.groovy:251) ~[basex-1.0.jar!/:na]
Best Regards, Martin
basex-talk@mailman.uni-konstanz.de