I was also interested in stemming. Awesome. I assume the codes for the lucene supported languages are the standard 2 letter codes for the listed language!?

On Wed, May 25, 2016 at 1:29 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Hi Kristian,

I have slightly updated our Wiki section on language support in [1].
For more information, I invite you to have a look at the related Java
classes (e.g. [2,3]) or ask some more questions.

Cheers,
Christian

[1] http://docs.basex.org/wiki/Full-Text#Languages
[2] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/util/ft/Language.java
[3] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/util/ft/WesternTokenizer.java



On Wed, May 25, 2016 at 9:27 PM, Kristian Kankainen
<kristian@keeleleek.ee> wrote:
> Probably the list of available locales is not the same as the list of
> languages that can be stemmed. I understood the question was about
> tokenization and full-text indexing in particular and not locales in
> general.
>
> Maybe I got it wrong, but I would still appreciate hints to technical docs
> about supported languages with stemming. What components are used for this?
>
> Cheers
> Kristian K
>
>
> 25.05.2016 20:21 Christian Grün kirjutas:
>>>
>>> Is it possible to add the list of supported values in the doc for
>>> LANGUAGE at: http://docs.basex.org/wiki/Options#Indexing.
>>
>> The list depends on your local Java environment. You can get a list via:
>>
>>    declare namespace locale = "java:java.util.Locale";
>>    (locale:getAvailableLocales() ! locale:getLanguage(.))
>>    => distinct-values()
>>    => sort()
>>
>> I have added this example to the documentation.
>>
>>
>>
>>> LANGUAGE
>>>
>>> SignatureLANGUAGE [lang]
>>> Defaulten
>>> SummaryThe specified language will influence the way how an input text
>>> will be tokenized. This option is mainly important if tokens are to be
>>> stemmed, or if the tokenization of a language differs from Western
>>> languages. See Full-Text Index for more details.
>>>
>>> Thanks!
>>>
>>> --
>>> France Baril
>>> Architecte documentaire / Documentation architect
>>> france.baril@architextus.com
>
>



--
France Baril
Architecte documentaire / Documentation architect
france.baril@architextus.com