Re: [basex-talk] More Diacritic Questions

30 Nov 2014


      Hi Graydon,
...
So I would expect that, with a full text search that ignores
diacritics, I'd get four hits.
By adding some collation hints to one of the standard string
functions, the comparison will succeed:
fn:compare('≮','&lt;','?lang=en;strength=primary')
In the example, I used the BaseX notation for collations (it is
similar to the notation in Saxon or Exist; in future, more and more
people will probably switch to the newly introduced UCA collation).
...
I don't think it's clear that "text" in "full text" means "groups of
letters".
I agree. Once again, the XQFT spec does not dictate what a "token" in
a full-text is. Currently, we only have two tokenizers: one for
Western languages and another one for Japanese (which gets along
without whitespaces). When we initially implemented the XQFT features
some years ago, our major use case was the search in a library catalog
(comprising meta data on appr. 2 million titles).
Best,
Christian
[1] http://docs.basex.org/wiki/Full-Text#Collations

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] More Diacritic Questions