I have a highly multithreaded application that uses BaseX as a translation/transformation engine. It has been functioning in production for almost a year. From time to time I see the following in the nightly report the application generates which is sent from the server hosting the application. It requires a restart of JBoss, used as the application server. Any ideas?
Name: Embedded EIP BaseX Server Thread : blocked
Blocked on: java.net.SocksSocketImpl@5915a689
Blocked by: 1428 (Thread-1368) Blocked Count: 1 Lock Info: java.net.SocksSocketImpl@5915a689 Stack Trace: java.net.PlainSocketImpl.accept(PlainSocketImpl.java:194) java.net.ServerSocket.implAccept(ServerSocket.java:545) java.net.ServerSocket.accept(ServerSocket.java:513) org.basex.BaseXServer.run(BaseXServer.java:142) com.pilotfish.eip.basex.BaseXServerRunner$1.run(BaseXServerRunner.java:93) java.lang.Thread.run(Thread.java:745)
Name: C3P0PooledConnectionPoolManager[identityToken->2szqjh9e17wbsc31kua7k9|58166f21]-HelperThread-#2 : timed_waiting
Thank you
Carl R Bondeson Systems Developer Department of Public Health Data Processing 410 Capitol Ave Hartford, CT 06134 Phone: 860-509-7434 carl.bondeson@ct.govmailto:carl.bondeson@ct.gov [logosmall]
Difficult to tell. Do you always close server instances? – Any reproducible use case would be appreciated.
On Tue, Feb 2, 2016 at 1:23 PM, Bondeson, Carl Carl.Bondeson@ct.gov wrote:
I have a highly multithreaded application that uses BaseX as a translation/transformation engine. It has been functioning in production for almost a year. From time to time I see the following in the nightly report the application generates which is sent from the server hosting the application. It requires a restart of JBoss, used as the application server. Any ideas?
Name: Embedded EIP BaseX
Server Thread : blocked
Blocked on:
java.net.SocksSocketImpl@5915a689
Blocked by: 1428
(Thread-1368)
Blocked Count: 1 Lock Info:
java.net.SocksSocketImpl@5915a689
Stack
Trace:
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:194)
java.net.ServerSocket.implAccept(ServerSocket.java:545)
java.net.ServerSocket.accept(ServerSocket.java:513)
org.basex.BaseXServer.run(BaseXServer.java:142)
com.pilotfish.eip.basex.BaseXServerRunner$1.run(BaseXServerRunner.java:93)
java.lang.Thread.run(Thread.java:745)
Name:
C3P0PooledConnectionPoolManager[identityToken->2szqjh9e17wbsc31kua7k9|58166f21]-HelperThread-#2 : timed_waiting
Thank you
Carl R Bondeson
Systems Developer
Department of Public Health
Data Processing
410 Capitol Ave
Hartford, CT 06134
Phone: 860-509-7434
carl.bondeson@ct.gov
[image: logosmall]
I have upwards of 1200 threads running at the point in time when this occurs. I make a point of closing the server instances in each XSLT transform. This system utilizes Java callouts from XSLT to perform many of the processes that XSLT is not capable of doing. The process that is causing the blocking utilizes a custom processor which merges common message types into single objects to lessen the amount of data the surveillance systems need to import. In this case I processed a year’s worth of data, in a single pass. This might be considered excessive but I like to stress systems in order to ferret out these types of issues. I just restarted JBoss and the same outcome was seen. This could be beneficial in finding out where the actual problem lies. Is the SocksSocketImpl just the socket management API that BaseX uses to communicate between client and server?
From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, February 02, 2016 7:32 AM To: Bondeson, Carl Carl.Bondeson@ct.gov Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Multithread lock
Difficult to tell. Do you always close server instances? – Any reproducible use case would be appreciated.
On Tue, Feb 2, 2016 at 1:23 PM, Bondeson, Carl <Carl.Bondeson@ct.govmailto:Carl.Bondeson@ct.gov> wrote: I have a highly multithreaded application that uses BaseX as a translation/transformation engine. It has been functioning in production for almost a year. From time to time I see the following in the nightly report the application generates which is sent from the server hosting the application. It requires a restart of JBoss, used as the application server. Any ideas?
Name: Embedded EIP BaseX Server Thread : blocked
Blocked on: java.net.SocksSocketImpl@5915a689mailto:java.net.SocksSocketImpl@5915a689
Blocked by: 1428 (Thread-1368) Blocked Count: 1 Lock Info: java.net.SocksSocketImpl@5915a689mailto:java.net.SocksSocketImpl@5915a689 Stack Trace: java.net.PlainSocketImpl.accept(PlainSocketImpl.java:194) java.net.ServerSocket.implAccept(ServerSocket.java:545) java.net.ServerSocket.accept(ServerSocket.java:513) org.basex.BaseXServer.run(BaseXServer.java:142) com.pilotfish.eip.basex.BaseXServerRunner$1.run(BaseXServerRunner.java:93) java.lang.Thread.run(Thread.java:745)
Name: C3P0PooledConnectionPoolManager[identityToken->2szqjh9e17wbsc31kua7k9|58166f21]-HelperThread-#2 : timed_waiting
Thank you
Carl R Bondeson Systems Developer Department of Public Health Data Processing 410 Capitol Ave Hartford, CT 06134 Phone: 860-509-7434 carl.bondeson@ct.govmailto:carl.bondeson@ct.gov [logosmall]
One other piece of information. There are 2 applications using BaseX on this system. They are using 7.6, while I am using 7.9. I am using alternate ports for the server and events.
From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, February 02, 2016 7:32 AM To: Bondeson, Carl Carl.Bondeson@ct.gov Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Multithread lock
Difficult to tell. Do you always close server instances? – Any reproducible use case would be appreciated.
On Tue, Feb 2, 2016 at 1:23 PM, Bondeson, Carl <Carl.Bondeson@ct.govmailto:Carl.Bondeson@ct.gov> wrote: I have a highly multithreaded application that uses BaseX as a translation/transformation engine. It has been functioning in production for almost a year. From time to time I see the following in the nightly report the application generates which is sent from the server hosting the application. It requires a restart of JBoss, used as the application server. Any ideas?
Name: Embedded EIP BaseX Server Thread : blocked
Blocked on: java.net.SocksSocketImpl@5915a689mailto:java.net.SocksSocketImpl@5915a689
Blocked by: 1428 (Thread-1368) Blocked Count: 1 Lock Info: java.net.SocksSocketImpl@5915a689mailto:java.net.SocksSocketImpl@5915a689 Stack Trace: java.net.PlainSocketImpl.accept(PlainSocketImpl.java:194) java.net.ServerSocket.implAccept(ServerSocket.java:545) java.net.ServerSocket.accept(ServerSocket.java:513) org.basex.BaseXServer.run(BaseXServer.java:142) com.pilotfish.eip.basex.BaseXServerRunner$1.run(BaseXServerRunner.java:93) java.lang.Thread.run(Thread.java:745)
Name: C3P0PooledConnectionPoolManager[identityToken->2szqjh9e17wbsc31kua7k9|58166f21]-HelperThread-#2 : timed_waiting
Thank you
Carl R Bondeson Systems Developer Department of Public Health Data Processing 410 Capitol Ave Hartford, CT 06134 Phone: 860-509-7434 carl.bondeson@ct.govmailto:carl.bondeson@ct.gov [logosmall]
Is the SocksSocketImpl just the socket management API that BaseX uses to communicate between client and server?
Yes. BaseX uses plain and simple Java sockets for client/server communication. Find out more here:
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
Maybe you could do some general research on the limits of Java socket connections (it may also relate to your OS and Java version).
One other piece of information. There are 2 applications using BaseX on this system. They are using 7.6, while I am using 7.9. I am using alternate ports for the server and events.
Feedback on the latest version of BaseX is always welcome, as 7.6 is pretty old now.
Hi,
Given this thesaurus entry
<thesaurus xmlns="http://www.w3.org/2007/xqftts/thesaurus"> <entry> <term>&</term> <synonym> <term>and</term> <relationship>USE</relationship> </synonym> </entry> </thesaurus>
I was expecting the following query to return true (file path omitted for clarify)
'Frontier Science and Technology Research Foundation, Inc.' contains text 'Frontier Science & Technology Research Foundation, Inc.' using thesaurus at "thesaurus.xml”
but it returns false. Switching the order of the term and synonym makes no difference.
I tried getting around this using a stop word file (which includes ‘and’, ‘&’, and '&’, just in case) but it does not work either.
Am I missing something?
Thanks, Ron
Hi Ron,
I’m pretty sure that the default tokenizer discards the ampersand and doesn’t pass it on as token at all.
Hope this helps (…at least for understanding the query result), Christian
On Tue, Feb 2, 2016 at 6:10 PM, Ron Katriel rkatriel@mdsol.com wrote:
Hi,
Given this thesaurus entry
<thesaurus xmlns="http://www.w3.org/2007/xqftts/thesaurus"> <entry> <term>&</term> <synonym> <term>and</term> <relationship>USE</relationship> </synonym> </entry> </thesaurus>
I was expecting the following query to return true (file path omitted for clarify)
'Frontier Science and Technology Research Foundation, Inc.' contains text 'Frontier Science & Technology Research Foundation, Inc.' using thesaurus at "thesaurus.xml”
but it returns false. Switching the order of the term and synonym makes no difference.
I tried getting around this using a stop word file (which includes ‘and’, ‘&’, and '&’, just in case) but it does not work either.
Am I missing something?
Thanks, Ron
Thanks, Christian. You are right about the tokenization of ampersands. However, I still see unexpected behavior with the built-in stop words.
1. This works (using your clever stop word workaround, slightly modified with string-join):
let $sw := map:merge( for $sw in file:read-text-lines('stopwords.txt') return map { $sw : true() } )
let $t1 := 'Frontier Science & Technology Research Foundation, Inc.' let $t2 := 'Frontier Science and Technology Research Foundation, Inc.' let $q1 := string-join(ft:tokenize($t1)[not($sw(.))], ' ') let $q2 := string-join(ft:tokenize($t2)[not($sw(.))], ' ') where $q1 contains text { $q2 } return <r> { <q1> { $q1 } </q1>, <q2> { $q2 } </q2> } </r>
2. This fails:
let $t1 := 'Frontier Science & Technology Research Foundation, Inc.' let $t2 := 'Frontier Science and Technology Research Foundation, Inc.' where $t1 contains text { $t2 } using stop words at 'stopwords.txt' or $t2 contains text { $t1 } using stop words at 'stopwords.txt' return <r> { <q1> { $t1 } </q1>, <q2> { $t2 } </q2> } </r>
Any idea why?
Thanks, Ron
On February 2, 2016 at 12:13:14 PM, Christian Grün (christian.gruen@gmail.com) wrote:
Hi Ron,
I’m pretty sure that the default tokenizer discards the ampersand and doesn’t pass it on as token at all.
Hope this helps (…at least for understanding the query result), Christian
On Tue, Feb 2, 2016 at 6:10 PM, Ron Katriel rkatriel@mdsol.com wrote:
Hi,
Given this thesaurus entry
<thesaurus xmlns="http://www.w3.org/2007/xqftts/thesaurus"> <entry> <term>&</term> <synonym> <term>and</term> <relationship>USE</relationship> </synonym> </entry> </thesaurus>
I was expecting the following query to return true (file path omitted for clarify)
'Frontier Science and Technology Research Foundation, Inc.' contains text 'Frontier Science & Technology Research Foundation, Inc.' using thesaurus at "thesaurus.xml”
but it returns false. Switching the order of the term and synonym makes no difference.
I tried getting around this using a stop word file (which includes ‘and’, ‘&’, and '&’, just in case) but it does not work either.
Am I missing something?
Thanks, Ron
Any idea why?
Yes – See one of my previous replies ;) In a nutshell: In the first query, stopwords will be dropped. In the second one, they will only be ignored (“Tokens matched by stop words retain their position numbers […]” [1]):
"A B C" contains text "A C" using stop words ("B") → false "A B C" contains text "A B C" using stop words ("B") → true
It may not be the most intuitive decision that has been taken back then by the designers of the spec, but… Les jeux sont faits.
In some projects, we’ve decided to work with custom index structures [2]. It’s some more work, but it will give you complete freedom on what tokens you want to store.
Hope this helps, Christian
[1] https://www.w3.org/TR/xpath-full-text-10/#ftstopwordoption [2] http://docs.basex.org/wiki/Indexes#Custom_Index_Structures
On Tue, Feb 2, 2016 at 6:56 PM, Ron Katriel rkatriel@mdsol.com wrote:
Thanks, Christian. You are right about the tokenization of ampersands. However, I still see unexpected behavior with the built-in stop words.
- This works (using your clever stop word workaround, slightly modified
with string-join):
let $sw := map:merge( for $sw in file:read-text-lines('stopwords.txt') return map { $sw : true() } )
let $t1 := 'Frontier Science & Technology Research Foundation, Inc.' let $t2 := 'Frontier Science and Technology Research Foundation, Inc.' let $q1 := string-join(ft:tokenize($t1)[not($sw(.))], ' ') let $q2 := string-join(ft:tokenize($t2)[not($sw(.))], ' ') where $q1 contains text { $q2 } return <r> { <q1> { $q1 } </q1>, <q2> { $q2 } </q2> } </r>
- This fails:
let $t1 := 'Frontier Science & Technology Research Foundation, Inc.' let $t2 := 'Frontier Science and Technology Research Foundation, Inc.' where $t1 contains text { $t2 } using stop words at 'stopwords.txt' or $t2 contains text { $t1 } using stop words at 'stopwords.txt' return <r> { <q1> { $t1 } </q1>, <q2> { $t2 } </q2> } </r>
Any idea why?
Thanks, Ron
On February 2, 2016 at 12:13:14 PM, Christian Grün (christian.gruen@gmail.com) wrote:
Hi Ron,
I’m pretty sure that the default tokenizer discards the ampersand and doesn’t pass it on as token at all.
Hope this helps (…at least for understanding the query result), Christian
On Tue, Feb 2, 2016 at 6:10 PM, Ron Katriel rkatriel@mdsol.com wrote:
Hi,
Given this thesaurus entry
<thesaurus xmlns="http://www.w3.org/2007/xqftts/thesaurus"> <entry> <term>&</term> <synonym> <term>and</term> <relationship>USE</relationship> </synonym> </entry> </thesaurus>
I was expecting the following query to return true (file path omitted for clarify)
'Frontier Science and Technology Research Foundation, Inc.' contains text 'Frontier Science & Technology Research Foundation, Inc.' using thesaurus at "thesaurus.xml”
but it returns false. Switching the order of the term and synonym makes no difference.
I tried getting around this using a stop word file (which includes ‘and’, ‘&’, and '&’, just in case) but it does not work either.
Am I missing something?
Thanks, Ron
Christian, I will second your description of this logic as “nonintuitive”. It seems to be driven more by efficiency concerns than usability (on the part of the W3C). Would it be possible to create a custom index structure in BaseX that would get around this limitation? If yes, as you seem to suggest below, can this be done dynamically? I had difficulty following the example in [2].
Thanks, Ron
On February 2, 2016 at 2:34:35 PM, Christian Grün (christian.gruen@gmail.com) wrote:
Any idea why?
Yes – See one of my previous replies ;) In a nutshell: In the first query, stopwords will be dropped. In the second one, they will only be ignored (“Tokens matched by stop words retain their position numbers […]” [1]):
"A B C" contains text "A C" using stop words ("B") → false "A B C" contains text "A B C" using stop words ("B") → true
It may not be the most intuitive decision that has been taken back then by the designers of the spec, but… Les jeux sont faits.
In some projects, we’ve decided to work with custom index structures [2]. It’s some more work, but it will give you complete freedom on what tokens you want to store.
Hope this helps, Christian
[1] https://www.w3.org/TR/xpath-full-text-10/#ftstopwordoption [2] http://docs.basex.org/wiki/Indexes#Custom_Index_Structures
On Tue, Feb 2, 2016 at 6:56 PM, Ron Katriel rkatriel@mdsol.com wrote:
Thanks, Christian. You are right about the tokenization of ampersands. However, I still see unexpected behavior with the built-in stop words.
- This works (using your clever stop word workaround, slightly modified
with string-join):
let $sw := map:merge( for $sw in file:read-text-lines('stopwords.txt') return map { $sw : true() } )
let $t1 := 'Frontier Science & Technology Research Foundation, Inc.' let $t2 := 'Frontier Science and Technology Research Foundation, Inc.' let $q1 := string-join(ft:tokenize($t1)[not($sw(.))], ' ') let $q2 := string-join(ft:tokenize($t2)[not($sw(.))], ' ') where $q1 contains text { $q2 } return <r> { <q1> { $q1 } </q1>, <q2> { $q2 } </q2> } </r>
- This fails:
let $t1 := 'Frontier Science & Technology Research Foundation, Inc.' let $t2 := 'Frontier Science and Technology Research Foundation, Inc.' where $t1 contains text { $t2 } using stop words at 'stopwords.txt' or $t2 contains text { $t1 } using stop words at 'stopwords.txt' return <r> { <q1> { $t1 } </q1>, <q2> { $t2 } </q2> } </r>
Any idea why?
Thanks, Ron
On February 2, 2016 at 12:13:14 PM, Christian Grün (christian.gruen@gmail.com) wrote:
Hi Ron,
I’m pretty sure that the default tokenizer discards the ampersand and doesn’t pass it on as token at all.
Hope this helps (…at least for understanding the query result), Christian
On Tue, Feb 2, 2016 at 6:10 PM, Ron Katriel rkatriel@mdsol.com wrote:
Hi,
Given this thesaurus entry
<thesaurus xmlns="http://www.w3.org/2007/xqftts/thesaurus"> <entry> <term>&</term> <synonym> <term>and</term> <relationship>USE</relationship> </synonym> </entry> </thesaurus>
I was expecting the following query to return true (file path omitted for clarify)
'Frontier Science and Technology Research Foundation, Inc.' contains text 'Frontier Science & Technology Research Foundation, Inc.' using thesaurus at "thesaurus.xml”
but it returns false. Switching the order of the term and synonym makes no difference.
I tried getting around this using a stop word file (which includes ‘and’, ‘&’, and '&’, just in case) but it does not work either.
Am I missing something?
Thanks, Ron
Would it be possible to create a custom index structure in BaseX that would get around this limitation? If yes, as you seem to suggest below, can this be done dynamically? I had difficulty following the example in [2].
Could you please give me more hints on what you don’t understand?
In the end this turned out to be caused by a method that didn't have the synchronized keyword and therefore was not handling the calls from multiple threads correctly. This has been addressed and the problem resolved.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, February 02, 2016 8:11 AM To: Bondeson, Carl Carl.Bondeson@ct.gov Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Multithread lock
Is the SocksSocketImpl just the socket management API that BaseX uses to communicate between client and server?
Yes. BaseX uses plain and simple Java sockets for client/server communication. Find out more here:
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
Maybe you could do some general research on the limits of Java socket connections (it may also relate to your OS and Java version).
One other piece of information. There are 2 applications using BaseX on this system. They are using 7.6, while I am using 7.9. I am using alternate ports for the server and events.
Feedback on the latest version of BaseX is always welcome, as 7.6 is pretty old now.
basex-talk@mailman.uni-konstanz.de