Re: [basex-talk] Error with full-text and fuzzy query - BaseX-Talk - mailman.uni-konstanz.de

7 Jun 2021


      cc to the list (2)…
Christian Grün christian.gruen@gmail.com schrieb am Mo., 7. Juni 2021,
15:47:
...
…and another snapshot for you: modifier letters [1] will now be
regarded as diacritical characters, and queries such as…
tokenize('faraʼid faraid Birkan')[. contains text 'faraʼid' using fuzzy]
…will only regard the first two terms as similar. As a consequence,
fuzzy queries for terms with modifiers should get a lot faster.
You’ll need to recreate your full-text index to take full advantage of the
fix.
[1] https://de.wikipedia.org/wiki/Unicodeblock_Spacing_Modifier_Letters
On Mon, Jun 7, 2021 at 2:00 PM Christian Grün christian.gruen@gmail.com
wrote:
...
...
Out of curiosity, is there a way to access those individual term
similarity statistics via XQuery?
...
I assume there is no straightforward way, but you can check the first
results of a full-text query:
ft:search('your-db', 'faraʼid', map { "fuzzy": true() })[position() <
10]
...
And I noticed that due to the modifier letter (ʼ), many non-similar
results are returned as well. I’ve raised another issue to track this
down [1].
[1] https://github.com/BaseXdb/basex/issues/2015