Hi Kristian,
I have slightly updated our Wiki section on language support in [1]. For more information, I invite you to have a look at the related Java classes (e.g. [2,3]) or ask some more questions.
Cheers, Christian
[1] http://docs.basex.org/wiki/Full-Text#Languages [2] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba... [3] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Wed, May 25, 2016 at 9:27 PM, Kristian Kankainen kristian@keeleleek.ee wrote:
Probably the list of available locales is not the same as the list of languages that can be stemmed. I understood the question was about tokenization and full-text indexing in particular and not locales in general.
Maybe I got it wrong, but I would still appreciate hints to technical docs about supported languages with stemming. What components are used for this?
Cheers Kristian K
25.05.2016 20:21 Christian Grün kirjutas:
Is it possible to add the list of supported values in the doc for LANGUAGE at: http://docs.basex.org/wiki/Options#Indexing.
The list depends on your local Java environment. You can get a list via:
declare namespace locale = "java:java.util.Locale"; (locale:getAvailableLocales() ! locale:getLanguage(.)) => distinct-values() => sort()
I have added this example to the documentation.
LANGUAGE
SignatureLANGUAGE [lang] Defaulten SummaryThe specified language will influence the way how an input text will be tokenized. This option is mainly important if tokens are to be stemmed, or if the tokenization of a language differs from Western languages. See Full-Text Index for more details.
Thanks!
-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com