On January 5, 2016 at 10:29:35 AM, Christian Grün (christian.gruen@gmail.com) wrote:
Phew… My guess is that no one has seriously looked at the interplay
between stop words and the thesaurus so far ;) Maybe (lower/upper)
case plays a role, too?
On Tue, Jan 5, 2016 at 4:26 PM, Ron Katriel <rkatriel@mdsol.com> wrote:
> Hi Christian,
>
> One follow up question. I thought stop words work in concert with the
> thesaurus but I came across a case where they do not seem to. The following
> query returns false
>
> "Samsung" contains text "Samsung Bioepis Co., Ltd." using fuzzy using
> stop words ( "co", "ltd") using thesaurus at "thesaurus.xml"
>
> even though the thesaurus contains the following
>
> <entry>
> <term>Samsung Bioepis</term>
> <synonym>
> <term>Samsung</term>
> <relationship>BT</relationship>
> </synonym>
> </entry>
>
> When I add the following synonym to the entry
>
> <synonym>
> <term>Samsung Bioepis Co., Ltd.</term>
> <relationship>USE</relationship>
> </synonym>
>
> the query matches. Am I missing something?
>
> Thanks,
> Ron
>
> On January 3, 2016 at 8:33:14 PM, Ron Katriel (rkatriel@mdsol.com) wrote:
>
> Thanks, Christian. I will look into the solution you suggested. Will need to
> cache the stop words to avoid repeatedly opening the file for reading.
>
> Ron
>
> On January 3, 2016 at 8:14:51 PM, Christian Grün (christian.gruen@gmail.com)
> wrote:
>
>> The behavior I am looking for is getting back false whenever the text
>> following ‘contains text' is reduced to an empty string. Is there a simple
>> what of checking that?
>
> Hm, sounds easy, but I don’t have an easy answer to that. We should
> probably extend our ft:tokenize function to also take a stopword
> option.
>
> What you can always do is write some additional code:
>
> declare function local:sw($terms, $sw) {
> let $sw := file:read-text-lines($sw)
> return $terms contains text { $sw } all words
> };
> if(local:sw('query terms', 'sw.txt')) then
> ...
>
>
>
>> On January 3, 2016 at 7:41:47 PM, Christian Grün
>> (christian.gruen@gmail.com)
>> wrote:
>>
>> Hi Ron,
>>
>>> "Superior Laboratories" contains text { "Medical Affairs" } using stop
>>> words ( "medical", "affairs” )
>>
>> I’m pretty sure that "true" is the right answer here. I must admit
>> that, due to the variety of options provided by the XQFT spec, it’s
>> often not too obvious what’s going on.
>>
>>> is there a way - without removing the stop words
>>> from the file - to override this behavior in XQuery so the above match
>>> will
>>> fail?
>>
>> Maybe an additional check could be used after the first 'contains
>> text' expression. In what particular cases would you like to get
>> 'false' as result?
>>
>> Christian