Re: [basex-talk] full text search collation

22 Jun 2012


      Hi Alex,
thanks for your mail.
...
I am trying to use it with
Greek texts and the default collation assumes that ά != α
As you correctly guessed, our tokenizer is not tailored (yet) for
Greek text corpora. To speed up things, we are using a simple static
Unicode mapping for character normalizations..
https://github.com/BaseXdb/basex/blob/master/src/main/java/org/basex/util/To...
If you'd manage to provide me with some appropriate tables for Greek
characters, I'll be glad to extend this mapping.
...
As a side question i noticed that the stemmers used from lucne are quite
outdated. 3.6.0 also includes a Greek stemmer. I tried to include the 3.6.0
stemmers  instead but language codes seem to be hardcoded in
util/ft/Language.java
Do you have a direct reference to your prefered Greek stemmer class?
It will be easy for us to directly include it in our core package (the
main advantage will be improved performance)..
Hope this helps,
Christian

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] full text search collation