On January 5, 2016 at 10:46:35 AM, Ron Katriel (rkatriel@mdsol.com) wrote:
Good catch. Case appears to also play a role. The following does not match"samsung" contains text "samsung bioepis co., ltd." using fuzzy using stop words ( "co", "ltd") using thesaurus at "thesaurus.xml"even when the thesaurus contains the synonym "Samsung Bioepis Co., Ltd.”I tried the other way around (thesaurus in lower case, query in mixed case) and it also fails to match.RonOn January 5, 2016 at 10:29:35 AM, Christian Grün (christian.gruen@gmail.com) wrote:
Phew… My guess is that no one has seriously looked at the interplay
between stop words and the thesaurus so far ;) Maybe (lower/upper)
case plays a role, too?
On Tue, Jan 5, 2016 at 4:26 PM, Ron Katriel <rkatriel@mdsol.com> wrote:
> Hi Christian,
>
> One follow up question. I thought stop words work in concert with the
> thesaurus but I came across a case where they do not seem to. The following
> query returns false
>
> "Samsung" contains text "Samsung Bioepis Co., Ltd." using fuzzy using
> stop words ( "co", "ltd") using thesaurus at "thesaurus.xml"
>
> even though the thesaurus contains the following
>
> <entry>
> <term>Samsung Bioepis</term>
> <synonym>
> <term>Samsung</term>
> <relationship>BT</relationship>
> </synonym>
> </entry>
>
> When I add the following synonym to the entry
>
> <synonym>
> <term>Samsung Bioepis Co., Ltd.</term>
> <relationship>USE</relationship>
> </synonym>
>
> the query matches. Am I missing something?
>
> Thanks,
> Ron
>
> On January 3, 2016 at 8:33:14 PM, Ron Katriel (rkatriel@mdsol.com) wrote:
>
> Thanks, Christian. I will look into the solution you suggested. Will need to
> cache the stop words to avoid repeatedly opening the file for reading.
>
> Ron
>
> On January 3, 2016 at 8:14:51 PM, Christian Grün (christian.gruen@gmail.com)
> wrote:
>
>> The behavior I am looking for is getting back false whenever the text
>> following ‘contains text' is reduced to an empty string. Is there a simple
>> what of checking that?
>
> Hm, sounds easy, but I don’t have an easy answer to that. We should
> probably extend our ft:tokenize function to also take a stopword
> option.
>
> What you can always do is write some additional code:
>
> declare function local:sw($terms, $sw) {
> let $sw := file:read-text-lines($sw)
> return $terms contains text { $sw } all words
> };
> if(local:sw('query terms', 'sw.txt')) then
> ...
>
>
>
>> On January 3, 2016 at 7:41:47 PM, Christian Grün
>> (christian.gruen@gmail.com)
>> wrote:
>>
>> Hi Ron,
>>
>>> "Superior Laboratories" contains text { "Medical Affairs" } using stop
>>> words ( "medical", "affairs” )
>>
>> I’m pretty sure that "true" is the right answer here. I must admit
>> that, due to the variety of options provided by the XQFT spec, it’s
>> often not too obvious what’s going on.
>>
>>> is there a way - without removing the stop words
>>> from the file - to override this behavior in XQuery so the above match
>>> will
>>> fail?
>>
>> Maybe an additional check could be used after the first 'contains
>> text' expression. In what particular cases would you like to get
>> 'false' as result?
>>
>> Christian