Thanks, Bridger--that's very helpful! I'm not sure what MarkLogic is using
exactly, but it seems fairly sophisticated (there's even an advanced option
for multiple stemming: e.g., "further" has "far," "farther," "further" as
stems).
--
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library
On Wed, Apr 13, 2022 at 12:13 PM Bridger Dyson-Smith
bdysonsmith@gmail.com
wrote:
> Hi Tim -
>
> On Wed, Apr 13, 2022 at 11:40 AM Tim Thompson
timathom@gmail.com wrote:
>
>> I'm currently involved in a project that's using MarkLogic, and I noticed
>> that its implementation of English-language stemming differs from that of
>> BaseX: e.g., "mouse" and "mice" both stem to "mouse."
>>
>> In BaseX, those words are stemmed separately. Is this a known limitation
>> of the internal English syntax parser?
>>
>> It's my (admittedly, *VERY*) limited understanding that the BaseX
> stemmer, at least for English, is limited to the Porter Stemmer[1]. The
> Porter Stemmer just stems, and doesn't handle stemming from plurals to
> singulars in the case of apophonic plurals.
>
> It'd be interesting to learn what stemmer(s) MarkLogic uses.
>
> And, while I'm not that familiar with it (and it would probably entail
> significant work to implement), the `ft:thesaurus()` function provides
> similar functionality:
> ```
> ft:thesaurus(
> <thesaurus>
> <entry>
> <term>mice</term>
> <synonym>
> <term>mouse</term>
> <relationship>NT</relationship>
> </synonym>
> <synonym>
> <term>rodent</term>
> <relationship>BTG</relationship>
> </synonym>
> </entry>
> </thesaurus>,
> 'mice'
> )
> ```
>
>
>> Example:
>>
>> db:create("stem-test",
>> <data>
>> <x>mouse</x>
>> <y>mice</y>
>> </data>
>> , "data", map {"ftindex": true(), "stemming": true(), "language": "en"}
>> )
>> ,
>> update:output(
>> ft:search("stem-test", "mice")
>> )
>>
>>
>> Thanks,
>> Tim
>>
>>
>>
> Best,
> Bridger
>
> [1]
>
https://github.com/BaseXdb/basex/blob/da1e55d0214e44c1532f121c282021db50a9aa...
>
>
> --
>> Tim A. Thompson (he, him)
>> Librarian for Applied Metadata Research
>> Yale University Library
>>
>>