cc to the list (2)…
Christian Grün christian.gruen@gmail.com schrieb am Mo., 7. Juni 2021, 15:47:
…and another snapshot for you: modifier letters [1] will now be regarded as diacritical characters, and queries such as…
tokenize('faraʼid faraid Birkan')[. contains text 'faraʼid' using fuzzy]
…will only regard the first two terms as similar. As a consequence, fuzzy queries for terms with modifiers should get a lot faster.
You’ll need to recreate your full-text index to take full advantage of the fix.
[1] https://de.wikipedia.org/wiki/Unicodeblock_Spacing_Modifier_Letters
On Mon, Jun 7, 2021 at 2:00 PM Christian Grün christian.gruen@gmail.com wrote:
Out of curiosity, is there a way to access those individual term
similarity statistics via XQuery?
I assume there is no straightforward way, but you can check the first results of a full-text query:
ft:search('your-db', 'faraʼid', map { "fuzzy": true() })[position() <
10]
And I noticed that due to the modifier letter (ʼ), many non-similar results are returned as well. I’ve raised another issue to track this down [1].
basex-talk@mailman.uni-konstanz.de