> When you say you can't reproduce it, do you mean you get 14 results from running this script?
Yes, that’s what I meant.
The upcoming information will be very technical and specific. You are welcome to focus on the examples.
Your updated example was helpful, and I noticed it’s a bunch of issues that lead to the unexpected results. The core challenge is that ft:mark and ft:extract only yield expected results if the internally collected full-text metadata is not lost at some stage during the internal processing – which can happen at many places hidden to the writer of the query.
In your specific example, the full-text information gets lost because the local:search function is too complex to be inlined by the compiler (which enables further optimizations that eventually allow metadata propagation). You can tackle this by forcing the compiler to inline your function:
declare %basex:inline function local:search(...)
Using '(ethnicgroups, languages)' instead of 'name() = (...)' is another practical advice; it helps the optimizer to detect at compile time that metadata will be available at runtime. Another solution is to use 'local-name()' instead of 'name()' (local-name does not rely on namespace that may possibly occur in a database, which also affects the way how full-text queries are evaluated).
Here’s a variant that should work:
declare function local:search(
$database as xs:string,
$query as xs:string
) {
let $country := ft:search($database, $query)/ancestor::country
let $search := function($node) { $node/text() contains text { $query } }
return (
ft:mark($country[.//name[$search(.)]]),
ft:mark($country[.//city[$search(.)]]),
ft:mark($country[.//(ethnicgroups, languages)[$search(.)]])
)
};
local:search('factbook', 'German')
…or…
let $search := function($nodes) { $nodes[text() contains text { $query }] }
return (ft:mark($country[$search(.//name)]), ...
From today’s perspective, we would certainly design ft:mark and ft:extract in a way that the results are always correct. The consequences, however, would be a much more restricted syntax.