Hello,
I encounter what seems to be a bug with the use of Full Text Index (basex 11.9).
I have created a database with FTINDEX (and CASESENS + DIACRITICS).
When I call once a function in the xquery below, the result is different than when I call it twice.
It seems that the whole database is searched in the case of 2 calls which is not what is expected.
Xquery is :
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := for $doc in $docs where $doc/book/@category='COOKING' return $doc
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
(: uncomment the line below to change getHit result :)
(: let $a := trace("res 2 : " || count(local:getHit($docsSmall, '2005'))) :)
return ''
Result is :
"res 1 : 1"
After uncommenting the second call, result is:
"res 1 : 2"
"res 2 : 2"
The same with this more simple xquery:
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := $docs[1]
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
return ''
Result is :
"res 1 : 2"
("res 1 : 1" expected)
2 documents are populating this database :
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
and
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
Best,
Yann
Dear Yann,
Thanks for the observation, for which I have created a GitHub issue [1]. It is definitely the enforceindex option that causes the problem; it seemingly enforces more than expected.
Best, Christian
[1] https://github.com/BaseXdb/basex/issues/2442
On Wed, Jun 4, 2025 at 1:39 PM Yann de Thézy yann.dethezy@gmail.com wrote:
Hello,
I encounter what seems to be a bug with the use of Full Text Index (basex 11.9).
I have created a database with FTINDEX (and CASESENS + DIACRITICS).
When I call once a function in the xquery below, the result is different than when I call it twice.
It seems that the whole database is searched in the case of 2 calls which is not what is expected.
Xquery is :
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := for $doc in $docs where $doc/book/@category='COOKING' return $doc
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
(: uncomment the line below to change getHit result :)
(: let $a := trace("res 2 : " || count(local:getHit($docsSmall, '2005'))) :)
return ''
Result is :
"res 1 : 1"
After uncommenting the second call, result is:
"res 1 : 2"
"res 2 : 2"
The same with this more simple xquery:
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := $docs[1]
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
return ''
Result is :
"res 1 : 2"
("res 1 : 1" expected)
2 documents are populating this database :
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
and
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
Best, Yann
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient Sans virus.www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient <#m_-5899971012538410723_m_4277892911591792556_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
Hi Yann,
the bug has been fixed; a new snapshot is available [1].
Best, Christian
[1] https://files.basex.org/releases/latest/
On Wed, Jun 4, 2025 at 1:39 PM Yann de Thézy yann.dethezy@gmail.com wrote:
Hello,
I encounter what seems to be a bug with the use of Full Text Index (basex 11.9).
I have created a database with FTINDEX (and CASESENS + DIACRITICS).
When I call once a function in the xquery below, the result is different than when I call it twice.
It seems that the whole database is searched in the case of 2 calls which is not what is expected.
Xquery is :
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := for $doc in $docs where $doc/book/@category='COOKING' return $doc
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
(: uncomment the line below to change getHit result :)
(: let $a := trace("res 2 : " || count(local:getHit($docsSmall, '2005'))) :)
return ''
Result is :
"res 1 : 1"
After uncommenting the second call, result is:
"res 1 : 2"
"res 2 : 2"
The same with this more simple xquery:
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := $docs[1]
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
return ''
Result is :
"res 1 : 2"
("res 1 : 1" expected)
2 documents are populating this database :
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
and
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
Best, Yann
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient Sans virus.www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient <#m_1251767113872515045_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
Hi Christian, thanks for the quick fix! I can't get the latest version gui to test (nothing appears after launching). I will try with a further latest. Best, Yann
Le mar. 10 juin 2025, 13:57, Christian Grün christian.gruen@gmail.com a écrit :
Hi Yann,
the bug has been fixed; a new snapshot is available [1].
Best, Christian
[1] https://files.basex.org/releases/latest/
On Wed, Jun 4, 2025 at 1:39 PM Yann de Thézy yann.dethezy@gmail.com wrote:
Hello,
I encounter what seems to be a bug with the use of Full Text Index (basex 11.9).
I have created a database with FTINDEX (and CASESENS + DIACRITICS).
When I call once a function in the xquery below, the result is different than when I call it twice.
It seems that the whole database is searched in the case of 2 calls which is not what is expected.
Xquery is :
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := for $doc in $docs where $doc/book/@category='COOKING' return $doc
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
(: uncomment the line below to change getHit result :)
(: let $a := trace("res 2 : " || count(local:getHit($docsSmall, '2005'))) :)
return ''
Result is :
"res 1 : 1"
After uncommenting the second call, result is:
"res 1 : 2"
"res 2 : 2"
The same with this more simple xquery:
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := $docs[1]
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
return ''
Result is :
"res 1 : 2"
("res 1 : 1" expected)
2 documents are populating this database :
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
and
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
Best, Yann
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient Sans virus.www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient <#m_-6271574854244688532_m_1251767113872515045_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
Java 17 is required for version 12; maybe that’s the problem?
Otherwise, you can try to start the GUI on command line and check the error output.
Y 2wt yann.dethezy@gmail.com schrieb am Di., 10. Juni 2025, 15:21:
Hi Christian, thanks for the quick fix! I can't get the latest version gui to test (nothing appears after launching). I will try with a further latest. Best, Yann
Le mar. 10 juin 2025, 13:57, Christian Grün christian.gruen@gmail.com a écrit :
Hi Yann,
the bug has been fixed; a new snapshot is available [1].
Best, Christian
[1] https://files.basex.org/releases/latest/
On Wed, Jun 4, 2025 at 1:39 PM Yann de Thézy yann.dethezy@gmail.com wrote:
Hello,
I encounter what seems to be a bug with the use of Full Text Index (basex 11.9).
I have created a database with FTINDEX (and CASESENS + DIACRITICS).
When I call once a function in the xquery below, the result is different than when I call it twice.
It seems that the whole database is searched in the case of 2 calls which is not what is expected.
Xquery is :
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := for $doc in $docs where $doc/book/@category='COOKING' return $doc
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
(: uncomment the line below to change getHit result :)
(: let $a := trace("res 2 : " || count(local:getHit($docsSmall, '2005'))) :)
return ''
Result is :
"res 1 : 1"
After uncommenting the second call, result is:
"res 1 : 2"
"res 2 : 2"
The same with this more simple xquery:
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := $docs[1]
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
return ''
Result is :
"res 1 : 2"
("res 1 : 1" expected)
2 documents are populating this database :
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
and
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
Best, Yann
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient Sans virus.www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient <#m_-5014917572888799391_m_-6271574854244688532_m_1251767113872515045_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
Yes, I missed that point, sorry. Nevertheless, I still encounter the problem with my xquery and data described before. I read "basex 12.0 beta c94e36c" on the gui windows header. Best, Yann
Le mar. 10 juin 2025, 15:27, Christian Grün christian.gruen@gmail.com a écrit :
Java 17 is required for version 12; maybe that’s the problem?
Otherwise, you can try to start the GUI on command line and check the error output.
Y 2wt yann.dethezy@gmail.com schrieb am Di., 10. Juni 2025, 15:21:
Hi Christian, thanks for the quick fix! I can't get the latest version gui to test (nothing appears after launching). I will try with a further latest. Best, Yann
Le mar. 10 juin 2025, 13:57, Christian Grün christian.gruen@gmail.com a écrit :
Hi Yann,
the bug has been fixed; a new snapshot is available [1].
Best, Christian
[1] https://files.basex.org/releases/latest/
On Wed, Jun 4, 2025 at 1:39 PM Yann de Thézy yann.dethezy@gmail.com wrote:
Hello,
I encounter what seems to be a bug with the use of Full Text Index (basex 11.9).
I have created a database with FTINDEX (and CASESENS + DIACRITICS).
When I call once a function in the xquery below, the result is different than when I call it twice.
It seems that the whole database is searched in the case of 2 calls which is not what is expected.
Xquery is :
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := for $doc in $docs where $doc/book/@category='COOKING' return $doc
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
(: uncomment the line below to change getHit result :)
(: let $a := trace("res 2 : " || count(local:getHit($docsSmall, '2005'))) :)
return ''
Result is :
"res 1 : 1"
After uncommenting the second call, result is:
"res 1 : 2"
"res 2 : 2"
The same with this more simple xquery:
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := $docs[1]
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
return ''
Result is :
"res 1 : 2"
("res 1 : 1" expected)
2 documents are populating this database :
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
and
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
Best, Yann
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient Sans virus.www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient <#m_4081836977665378975_m_-5014917572888799391_m_-6271574854244688532_m_1251767113872515045_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
I apologize as well; I missed to consider one more edge case. I hope the very latest snapshot does the job.
On Tue, Jun 10, 2025 at 4:32 PM Y 2wt yann.dethezy@gmail.com wrote:
Yes, I missed that point, sorry. Nevertheless, I still encounter the problem with my xquery and data described before. I read "basex 12.0 beta c94e36c" on the gui windows header. Best, Yann
Le mar. 10 juin 2025, 15:27, Christian Grün christian.gruen@gmail.com a écrit :
Java 17 is required for version 12; maybe that’s the problem?
Otherwise, you can try to start the GUI on command line and check the error output.
Y 2wt yann.dethezy@gmail.com schrieb am Di., 10. Juni 2025, 15:21:
Hi Christian, thanks for the quick fix! I can't get the latest version gui to test (nothing appears after launching). I will try with a further latest. Best, Yann
Le mar. 10 juin 2025, 13:57, Christian Grün christian.gruen@gmail.com a écrit :
Hi Yann,
the bug has been fixed; a new snapshot is available [1].
Best, Christian
[1] https://files.basex.org/releases/latest/
On Wed, Jun 4, 2025 at 1:39 PM Yann de Thézy yann.dethezy@gmail.com wrote:
Hello,
I encounter what seems to be a bug with the use of Full Text Index (basex 11.9).
I have created a database with FTINDEX (and CASESENS + DIACRITICS).
When I call once a function in the xquery below, the result is different than when I call it twice.
It seems that the whole database is searched in the case of 2 calls which is not what is expected.
Xquery is :
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := for $doc in $docs where $doc/book/@category='COOKING' return $doc
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
(: uncomment the line below to change getHit result :)
(: let $a := trace("res 2 : " || count(local:getHit($docsSmall, '2005'))) :)
return ''
Result is :
"res 1 : 1"
After uncommenting the second call, result is:
"res 1 : 2"
"res 2 : 2"
The same with this more simple xquery:
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := $docs[1]
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
return ''
Result is :
"res 1 : 2"
("res 1 : 1" expected)
2 documents are populating this database :
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
and
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
Best, Yann
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient Sans virus.www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient <#m_7897201003560183116_m_4081836977665378975_m_-5014917572888799391_m_-6271574854244688532_m_1251767113872515045_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
Hi Christian, Yes it does, thanks. For a bigger database (34386 documents), I encounter a performance drop in ftinfex use : a few seconds in 11.9 against 17 minutes with the last release. Words with a few hits respond quickly but when hits are numerous (here around 4000), it takes many minutes (here 17 minutes). Best, Yann
Le mar. 10 juin 2025, 16:52, Christian Grün christian.gruen@gmail.com a écrit :
I apologize as well; I missed to consider one more edge case. I hope the very latest snapshot does the job.
On Tue, Jun 10, 2025 at 4:32 PM Y 2wt yann.dethezy@gmail.com wrote:
Yes, I missed that point, sorry. Nevertheless, I still encounter the problem with my xquery and data described before. I read "basex 12.0 beta c94e36c" on the gui windows header. Best, Yann
Le mar. 10 juin 2025, 15:27, Christian Grün christian.gruen@gmail.com a écrit :
Java 17 is required for version 12; maybe that’s the problem?
Otherwise, you can try to start the GUI on command line and check the error output.
Y 2wt yann.dethezy@gmail.com schrieb am Di., 10. Juni 2025, 15:21:
Hi Christian, thanks for the quick fix! I can't get the latest version gui to test (nothing appears after launching). I will try with a further latest. Best, Yann
Le mar. 10 juin 2025, 13:57, Christian Grün christian.gruen@gmail.com a écrit :
Hi Yann,
the bug has been fixed; a new snapshot is available [1].
Best, Christian
[1] https://files.basex.org/releases/latest/
On Wed, Jun 4, 2025 at 1:39 PM Yann de Thézy yann.dethezy@gmail.com wrote:
Hello,
I encounter what seems to be a bug with the use of Full Text Index (basex 11.9).
I have created a database with FTINDEX (and CASESENS + DIACRITICS).
When I call once a function in the xquery below, the result is different than when I call it twice.
It seems that the whole database is searched in the case of 2 calls which is not what is expected.
Xquery is :
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := for $doc in $docs where $doc/book/@category='COOKING' return $doc
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
(: uncomment the line below to change getHit result :)
(: let $a := trace("res 2 : " || count(local:getHit($docsSmall, '2005'))) :)
return ''
Result is :
"res 1 : 1"
After uncommenting the second call, result is:
"res 1 : 2"
"res 2 : 2"
The same with this more simple xquery:
declare function local:getHit($documents, $word) {
(# db:enforceindex #) {
$documents//*[text() contains text {$word}]
}
};
let $docs := collection()
let $docsSmall := $docs[1]
let $a := trace("res 1 : " || count(local:getHit($docsSmall, '2005')))
return ''
Result is :
"res 1 : 2"
("res 1 : 1" expected)
2 documents are populating this database :
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
and
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
Best, Yann
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient Sans virus.www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient <#m_614998045807851069_m_-2700859130274734348_m_3186679662250041879_m_1095654153149476100_m_7897201003560183116_m_4081836977665378975_m_-5014917572888799391_m_-6271574854244688532_m_1251767113872515045_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
For a bigger database (34386 documents), I encounter a performance drop in ftinfex use : a few seconds in 11.9 against 17 minutes with the last release. Words with a few hits respond quickly but when hits are numerous (here around 4000), it takes many minutes (here 17 minutes).
I guess it would take longer… but probably not much as in your case.
Basically, the enforceindex option is a lax convenience workaround for cases in which the optimizer is not smart enough to rewrite the expression for index access. However, to ensure that the results are correct, and match the input documents, several steps need to be performed that are otherwise done once at compile time. Specifically, an expression like…
$documents//text()[. contains text 'X']
…is rewritten to something like…
ft:search('db', 'X')[some $doc in $documents satisfies $doc is ancestor::document-node()]
In your case, it’s probably better to directly use ft:search. There are several approaches how to ensure that the results are contained in specific documents of a database. One approach is to match the database paths of the document and the result nodes:
declare function local:getHit($documents, $word) { let $db := db:name(head($documents)) let $paths := $documents ! db:path(.) return ft:search($db, $word)[db:path(.) = $paths] };
Hope this helps, Christian
in fact I have try, before using enforceindex, to use ft:search but I didn't find how to set case and diacritic options which I need sometimes to activate and sometimes not. Is there a way to manage them ?
Le mer. 11 juin 2025, 10:41, Christian Grün christian.gruen@gmail.com a écrit :
For a bigger database (34386 documents), I encounter a performance drop in
ftinfex use : a few seconds in 11.9 against 17 minutes with the last release. Words with a few hits respond quickly but when hits are numerous (here around 4000), it takes many minutes (here 17 minutes).
I guess it would take longer… but probably not much as in your case.
Basically, the enforceindex option is a lax convenience workaround for cases in which the optimizer is not smart enough to rewrite the expression for index access. However, to ensure that the results are correct, and match the input documents, several steps need to be performed that are otherwise done once at compile time. Specifically, an expression like…
$documents//text()[. contains text 'X']
…is rewritten to something like…
ft:search('db', 'X')[some $doc in $documents satisfies $doc is ancestor::document-node()]
In your case, it’s probably better to directly use ft:search. There are several approaches how to ensure that the results are contained in specific documents of a database. One approach is to match the database paths of the document and the result nodes:
declare function local:getHit($documents, $word) { let $db := db:name(head($documents)) let $paths := $documents ! db:path(.) return ft:search($db, $word)[db:path(.) = $paths] };
Hope this helps, Christian
in fact I have try, before using enforceindex, to use ft:search but I didn't find how to set case and diacritic options which I need sometimes to activate and sometimes not. Is there a way to manage them ?
Note that a 'contains text' query will only be rewritten for index access if the specified options (case, diacritics, others) match the options that you specified when creating the index. This means that…
//text()[. contains text 'X' using case sensitive]
…can get slow if case sensitivity was disabled in the index. If you want to support both alternatives, you can use liberal options when creating the index and filter your results with more specific options, for example like…
for $result in //text()[. contains text 'X'] where $result contains text 'X' case sensitive
…or…
let $word := 'X' for $result in ft:search('db', $word) where ft:contains($result, $word, { 'case': true() }
Le mer. 11 juin 2025, 10:41, Christian Grün christian.gruen@gmail.com a
écrit :
For a bigger database (34386 documents), I encounter a performance drop
in ftinfex use : a few seconds in 11.9 against 17 minutes with the last release. Words with a few hits respond quickly but when hits are numerous (here around 4000), it takes many minutes (here 17 minutes).
I guess it would take longer… but probably not much as in your case.
Basically, the enforceindex option is a lax convenience workaround for cases in which the optimizer is not smart enough to rewrite the expression for index access. However, to ensure that the results are correct, and match the input documents, several steps need to be performed that are otherwise done once at compile time. Specifically, an expression like…
$documents//text()[. contains text 'X']
…is rewritten to something like…
ft:search('db', 'X')[some $doc in $documents satisfies $doc is ancestor::document-node()]
In your case, it’s probably better to directly use ft:search. There are several approaches how to ensure that the results are contained in specific documents of a database. One approach is to match the database paths of the document and the result nodes:
declare function local:getHit($documents, $word) { let $db := db:name(head($documents)) let $paths := $documents ! db:path(.) return ft:search($db, $word)[db:path(.) = $paths] };
Hope this helps, Christian
basex-talk@mailman.uni-konstanz.de