…and another snapshot for you: modifier letters [1] will now be
regarded as diacritical characters, and queries such as…
tokenize('faraʼid faraid Birkan')[. contains text 'faraʼid' using fuzzy]
…will only regard the first two terms as similar. As a consequence,
fuzzy queries for terms with modifiers should get a lot faster.
You’ll need to recreate your full-text index to take full advantage of the fix.
[1] https://de.wikipedia.org/wiki/Unicodeblock_Spacing_Modifier_Letters
On Mon, Jun 7, 2021 at 2:00 PM Christian Grün <christian.gruen@gmail.com> wrote:
>
> > Out of curiosity, is there a way to access those individual term similarity statistics via XQuery?
>
> I assume there is no straightforward way, but you can check the first
> results of a full-text query:
>
> ft:search('your-db', 'faraʼid', map { "fuzzy": true() })[position() < 10]
>
> And I noticed that due to the modifier letter (ʼ), many non-similar
> results are returned as well. I’ve raised another issue to track this
> down [1].
>
> [1] https://github.com/BaseXdb/basex/issues/2015