Dear all,
as stated in the documentation for fulltext, since I have lucene-3.4.0 jar file in the lib directory of BaseX (and thus in its classpath which is java -cp [...]/basex/BaseX.jar:[...]/basex/lib/custom/*:[...]/basex/lib/*: -Xmx2g org.basex.BaseXGUI) I was expecting the following query to return true()
"cammina" contains text "camminare" any word using stemming using language "it" (: tried also with Italian:)
But I always get false() with any combination whereas it works neatly with embedded English and German.
Ho can I understand if stemming is actually used and in case with what language.
Thanks for your support,
Marco.
Hi Marco,
"cammina" contains text "camminare" any word using stemming using
language "it" (: tried also with Italian:)
But I always get false() with any combination whereas it works neatly with embedded English and German.
For some reasons, the Lucene stemmer transforms 'camminare' to 'camminar'. This can be observed by using ft:tokenize or ft:normalize:
ft:tokenize('camminare', { 'language': 'it', 'stemming': true() })
BaseX comes with various builtin stemmers. If you have a suitable Italian stemmer, we could include it in our base distribution.
Spero che aiuti, Christian
basex-talk@mailman.uni-konstanz.de