[basex-talk] Stemming in BaseX Full-Text

13 Apr 2022


      I'm currently involved in a project that's using MarkLogic, and I noticed
that its implementation of English-language stemming differs from that of
BaseX: e.g., "mouse" and "mice" both stem to "mouse."
In BaseX, those words are stemmed separately. Is this a known limitation of
the internal English syntax parser?
Example:
db:create("stem-test",
  <data>
    <x>mouse</x>
    <y>mice</y>
  </data>
  , "data", map {"ftindex": true(), "stemming": true(), "language": "en"}
)
,
update:output(
  ft:search("stem-test", "mice")
)
Thanks,
Tim
-- 
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

[basex-talk] Stemming in BaseX Full-Text