Hi Francis,
ft:mark() and ft:extract() cannot be used with any intermediate looping construct, at least in BaseX 7.3. [...]
Good point. I was surprised to see that this has not been covered yet in our documentation. I have updated the module page and hope it’s clearer now [1] (even if I sticked with black as text color ;). The reason for this behavior is that position information can easily blow up main memory, and it’s a non-trivial optimization task to find out which position information will later be required by an expression like ft:mark() or ft:extract(). However, the behavior may change in future versions of BaseX.
The usual workaround is to use more than one full-text expression
let $term := 'welcome' for $ft in db:open( 'DB' )//*[text() contains text { $term }] return element hit { ft:extract( $ft[text() contains text { $term }] ) }
I agree that this creates redundant code and not how it should ideally be, but at least it’s usually no bottleneck regarding performance. In most of our productive applications that use "contains text" or ft:search(), the overall query code is much more complex anyway (extendiing across several functions) such that we are hardly confronted with this restriction, which is one of the reasons why we didn’t push the optimizations any further.
Perhaps a better method is to have a function with a data structure that contains the text matched text node (as a reference, so that node references are retained) *and* matching substrings explicitly and separately. [...]
True; we could think about further splitting up the process, and introduce more low-level functions that directly return position information. Our original plan was to focus on the XQuery Full Text specification, but it more and more urns out that our users switch over to our BaseX-specific functions, as they are more straightforward to use.
Thanks for your remaining suggestions; they could be a useful resource for future extensions.
Christian