I'm currently involved in a project that's using MarkLogic, and I noticed that its implementation of English-language stemming differs from that of BaseX: e.g., "mouse" and "mice" both stem to "mouse."

In BaseX, those words are stemmed separately. Is this a known limitation of the internal English syntax parser?

Example:

db:create("stem-test",
<data>
<x>mouse</x>
<y>mice</y>
</data>
, "data", map {"ftindex": true(), "stemming": true(), "language": "en"}
)
,
update:output(
ft:search("stem-test", "mice")
)

Thanks,

Tim

--
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library