Re: [basex-talk] strange behavior of ft:mark() and ft:extract()

14 Dec 2012


      Hi Francis,
...
ft:mark() and ft:extract() cannot be used with any intermediate looping construct, at least in BaseX 7.3. [...]
Good point. I was surprised to see that this has not been covered yet
in our documentation. I have updated the module page and hope it’s
clearer now [1] (even if I sticked with black as text color ;). The
reason for this behavior is that position information can easily blow
up main memory, and it’s a non-trivial optimization task to find out
which position information will later be required by an expression
like ft:mark() or ft:extract(). However, the behavior may change in
future versions of BaseX.
The usual workaround is to use more than one full-text expression
let $term  := 'welcome'
  for $ft in db:open( 'DB' )//*[text() contains text { $term }]
  return element hit {
    ft:extract( $ft[text() contains text { $term }] )
  }
I agree that this creates redundant code and not how it should ideally
be, but at least it’s usually no bottleneck regarding performance. In
most of our productive applications that use "contains text" or
ft:search(), the overall query code is much more complex anyway
(extendiing across several functions) such that we are hardly
confronted with this restriction, which is one of the reasons why we
didn’t push the optimizations any further.
...
Perhaps a better method is to have a function with a data structure that contains the text matched text node (as a reference, so that node references are retained) *and* matching substrings explicitly and separately. [...]
True; we could think about further splitting up the process, and
introduce more low-level functions that directly return position
information. Our original plan was to focus on the XQuery Full Text
specification, but it more and more urns out that our users switch
over to our BaseX-specific functions, as they are more straightforward
to use.
Thanks for your remaining suggestions; they could be a useful resource
for future extensions.
Christian
[1] http://docs.basex.org/wiki/Full-Text_Module#ft:mark

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] strange behavior of ft:mark() and ft:extract()