cc to the list (2)…

Christian Grün <christian.gruen@gmail.com> schrieb am Mo., 7. Juni 2021, 15:47:
…and another snapshot for you: modifier letters [1] will now be
regarded as diacritical characters, and queries such as…

tokenize('faraʼid faraid Birkan')[. contains text 'faraʼid' using fuzzy]

…will only regard the first two terms as similar. As a consequence,
fuzzy queries for terms with modifiers should get a lot faster.

You’ll need to recreate your full-text index to take full advantage of the fix.

[1] https://de.wikipedia.org/wiki/Unicodeblock_Spacing_Modifier_Letters



On Mon, Jun 7, 2021 at 2:00 PM Christian Grün <christian.gruen@gmail.com> wrote:
>
> > Out of curiosity, is there a way to access those individual term similarity statistics via XQuery?
>
> I assume there is no straightforward way, but you can check the first
> results of a full-text query:
>
>   ft:search('your-db', 'faraʼid', map { "fuzzy": true() })[position() < 10]
>
> And I noticed that due to the modifier letter (ʼ), many non-similar
> results are returned as well. I’ve raised another issue to track this
> down [1].
>
> [1] https://github.com/BaseXdb/basex/issues/2015