I'm currently involved in a project that's using MarkLogic, and I noticed that its implementation of English-language stemming differs from that of BaseX: e.g., "mouse" and "mice" both stem to "mouse."

In BaseX, those words are stemmed separately. Is this a known limitation of the internal English syntax parser?

Example:

db:create("stem-test",
  <data>
    <x>mouse</x>
    <y>mice</y>
  </data>
  , "data", map {"ftindex": true(), "stemming": true(), "language": "en"}
)
,
update:output(
  ft:search("stem-test", "mice")  
)


Thanks,
Tim


--
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library