Hi,
I noticed an unexpected behavior with full-text matching using stop words. The actual code is somewhat complex (it matches CT.gov trials with sponsor studies) but I was able to distill it to a simple expression:
"Superior Laboratories" contains text { "Medical Affairs" } using stop words ( "medical", "affairs” )
“Superior Laboratories” is the name of a (made up) sponsor and “Medical Affairs” is the value of an XML element (clinical_study/overall_official/affiliation) in an actual CT.gov trial (http://clinicaltrials.gov/search?term=NCT00775398&resultsxml=true).
This expression evaluates to true because “Superior Laboratories” vacuously contains the empty string (i.e., what is left after the stop words are removed from the official affiliation).
In actuality the stop words are loaded from a file containing over 400 words. The idea is to remove frequently occurring words from sponsor names (e.g., laboratories, limited, medical, pharmaceutical, etc.) to increase the chances of matching.
Is the above behavior intentional or an artifact of the way the matching is implemented? If the former, is there a way - without removing the stop words from the file - to override this behavior in XQuery so the above match will fail?
Thanks, Ron