Hello,
I have a full-text index defined on a database with 21,936,670 entries. I'm hitting a stack overflow error when I try to query the following (normalized) string with a fuzzy query:
"sajawandi, siraj al-din muhammad ibn muhammad, active 12th century. faraʼid al-sirajiyah"
The index entries look like this:
<d a="http://id.loc.gov/authorities/names/n2008037847"> <n>Sajāwandī, Sirāj al-Dīn Muḥammad ibn Muḥammad, active 12th century. Farāʼiḍ al-Sirājīyah</n> <d>12th cent.</d> </d>
If I create a new DB ("ftindex-test") with only that one entry and try the following lookup, it works:
ft:search( "ftindex-test", "sajawandi, siraj al-din muhammad ibn muhammad, active 12th century. faraʼid al-sirajiyah" , map {"mode": "phrase", "content": "entire", "fuzzy": true()} )/..
However, if I try the same query against the full index (with ~22 million entries), it fails with a stack overflow error:
[qtp1546693040-47] WARN org.eclipse.jetty.server.HttpChannel - /dba/query-eval java.lang.StackOverflowError at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:73) at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:74) ...
If I remove the "fuzzy" parameter, the query does work against the full index. Is this a bug, a known limitation here, or something that I'm missing?
Thanks in advance, Tim
-- Tim A. Thompson Metadata Librarian Yale University Library
Hi Tim,
[qtp1546693040-47] WARN org.eclipse.jetty.server.HttpChannel - /dba/query-eval java.lang.StackOverflowError at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:73) at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:74)
Could you share some more lines of the stack trace with us?
Thanks in advance Christian
Thanks, Christian. The rest of the stack trace was just many lines of "at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:74)" until the end:
[qtp1546693040-47] WARN org.eclipse.jetty.server.handler.ErrorHandler - Error page too large: 500 java.lang.StackOverflowError Request(POST // 10.5.157.229:10214/dba/query-eval)@7dcadc62 [qtp1546693040-47] INFO org.eclipse.jetty.server.handler.ErrorHandler - Disabling showsStacks for ErrorPageErrorHandler@53032c30{STARTED}
Best regards, Tim
-- Tim A. Thompson Metadata Librarian Yale University Library
On Fri, Jun 4, 2021 at 1:30 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Tim,
[qtp1546693040-47] WARN org.eclipse.jetty.server.HttpChannel -
/dba/query-eval
java.lang.StackOverflowError at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:73) at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:74)
Could you share some more lines of the stack trace with us?
Thanks in advance Christian
Hi Tim,
I’m still trying to get this reproduced.
The error is caused by excessive recursive functions call. Out of interest, could you try to increase the size of the Java Stack Trace and see if the error persists? This can be achieved by assigning a large value to the JVM via the Xss flag [1]. You could e.g. add -Xss64m to the BASEX_JVM variable in the BaseX start scripts [2].
Internal notes… If yes, we might be confronted with an infinite loop. If no, we should try to rewrite a recursive index lookup to an iterative one.
If your data is not confidential, feel free to provide me with a download link.
Cheers, Christian
[1] https://stackoverflow.com/questions/3700459/how-to-increase-the-java-stack-s... [2] https://docs.basex.org/wiki/Start_Scripts
On Fri, Jun 4, 2021 at 11:03 PM Tim Thompson timathom@gmail.com wrote:
Thanks, Christian. The rest of the stack trace was just many lines of "at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:74)" until the end:
[qtp1546693040-47] WARN org.eclipse.jetty.server.handler.ErrorHandler - Error page too large: 500 java.lang.StackOverflowError Request(POST //10.5.157.229:10214/dba/query-eval)@7dcadc62 [qtp1546693040-47] INFO org.eclipse.jetty.server.handler.ErrorHandler - Disabling showsStacks for ErrorPageErrorHandler@53032c30{STARTED}
Best regards, Tim
-- Tim A. Thompson Metadata Librarian Yale University Library
On Fri, Jun 4, 2021 at 1:30 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Tim,
[qtp1546693040-47] WARN org.eclipse.jetty.server.HttpChannel - /dba/query-eval java.lang.StackOverflowError at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:73) at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:74)
Could you share some more lines of the stack trace with us?
Thanks in advance Christian
I think I managed to build up a bug scenario that should resemble yours [1]: If thousands of similar terms are found, the results cannot be recursively joined anymore.
[1] https://github.com/BaseXdb/basex/issues/2014
On Mon, Jun 7, 2021 at 10:28 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Tim,
I’m still trying to get this reproduced.
The error is caused by excessive recursive functions call. Out of interest, could you try to increase the size of the Java Stack Trace and see if the error persists? This can be achieved by assigning a large value to the JVM via the Xss flag [1]. You could e.g. add -Xss64m to the BASEX_JVM variable in the BaseX start scripts [2].
Internal notes… If yes, we might be confronted with an infinite loop. If no, we should try to rewrite a recursive index lookup to an iterative one.
If your data is not confidential, feel free to provide me with a download link.
Cheers, Christian
[1] https://stackoverflow.com/questions/3700459/how-to-increase-the-java-stack-s... [2] https://docs.basex.org/wiki/Start_Scripts
On Fri, Jun 4, 2021 at 11:03 PM Tim Thompson timathom@gmail.com wrote:
Thanks, Christian. The rest of the stack trace was just many lines of "at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:74)" until the end:
[qtp1546693040-47] WARN org.eclipse.jetty.server.handler.ErrorHandler - Error page too large: 500 java.lang.StackOverflowError Request(POST //10.5.157.229:10214/dba/query-eval)@7dcadc62 [qtp1546693040-47] INFO org.eclipse.jetty.server.handler.ErrorHandler - Disabling showsStacks for ErrorPageErrorHandler@53032c30{STARTED}
Best regards, Tim
-- Tim A. Thompson Metadata Librarian Yale University Library
On Fri, Jun 4, 2021 at 1:30 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Tim,
[qtp1546693040-47] WARN org.eclipse.jetty.server.HttpChannel - /dba/query-eval java.lang.StackOverflowError at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:73) at org.basex.index.query.FTIndexIterator$2.pos(FTIndexIterator.java:74)
Could you share some more lines of the stack trace with us?
Thanks in advance Christian
basex-talk@mailman.uni-konstanz.de