From:
Eliot Kimber <eliot.kimber@servicenow.com>
Date: Monday, December 4, 2023 at 6:00 PM
To: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Subject: Using ft:search() across element boundaries: possible?
I’m searching for short phrases where I may want to respect order or not and where the phrases may cross element boundaries.
For example, I have the phrase “Amazon Alexa Spoke” and I want to find any DITA topic whose title text includes “Amazon Alexa Spoke” in that order, or maybe I want those words
in any order, depending on my search requirements.
When I run this query against my database I find occurrences where all three words are in the same parent element, i.e.:
<title>Create a connection record for the <ph>Amazon Alexa spoke</ph>
</title>
<title>Create a credential record for the <ph>Amazon Alexa spoke</ph>
</title>
<title>Set up the <ph>Amazon Alexa spoke</ph>
</title>
But I do not find it where one of the words is not in the same parent:
This title is *not* found (even though this is the one I actually want to have found):
<title><ph id="alexa">Amazon Alexa</ph> Spoke</title>
Reading the docs on ft:search(), it is clear that it is searching on text nodes:
“Returns all text nodes from the full-text index…”
So I think the behavior here is as documented.
Short of creating a separate database that removes the subelements within <title> elements, is there a way to use full text indexing to do the search I want? In particular, I want to be able to turn the ordered/unordered check on or off.
If I always wanted ordered I could just use a regular expression match—it wouldn’t be that efficient but efficiency is not a concern in this particular case (but I can see where it would be in a more general search support situation).
Or am I missing a more obvious solution to this requirement?
Note that in this case I don’t care about finding different word forms—for this particular search I only care about exact word matches.
Cheers,
E.
_____________________________________________
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com
LinkedIn | Twitter | YouTube | Facebook