Hi Christian,

Many thanks for your answer. I already use ft:mark to bold the terms, and it works great, but I need to sort the answers (“sentence” element) according to the distance between the match and the beginning of the sentence (or a specific word at some position). So if I search “DNA” in the following sentences:

1. <sentence id="1.1.122.1.122">The translated protein showed weak DNA binding with a specificity for the kappa B binding motif.</sentence>
2. <sentence id="54.1.5.1.698">Using this assay system, we have evaluated the contributions of ligand binding and heat activation to DNA binding by these glucocorticoid receptors.</sentence>
3. <sentence id="2.1.17.1.79”>2.5 Mesocosm DNA extraction and purification</sentence>

I need the results order to be: 3, 1, 2. The sentence element is always a text. I was going to implement a function to do something like:

for $sentence in //sentence
where $sentence[text() contains text ‘DNA’]
order by local:distance($sentence, ‘DNA')
return $sentence

The distance function could also be called as local:distance($sentence, ‘DNA’, position_to_compare) (by default position_to_compare=1). If there are several matches, I consider the min distance.

Do you have any idea if there is a possible approach to do this with BaseX?

Thank you again.

Best,

Javier

El 26/11/2014, a las 14:02, Christian Grün <christian.gruen@gmail.com> escribió:

Hi Javier,

Thanks for your mail.

It's currently not possible to directly access the position information that is internally used for computing the results. The reasons are manifold:

* The positions do not reflect the actual substring anymore. Instead, we enumerate all tokens that remain after normalizing the input (i.e., after the removal of stopwords, stemming, etc.). So, in practice, it is difficult to assign those positional information to the original input.

* The positions can stretch over several elements (for example, the following query yields true: <x>X<y/>Z</x> contains text "XZ")

* The data structures containing the positions can potentially consume lots of space, so they are usually discarded after the result is returned.

What would you like to do with the information? Maybe you have seen the ft:mark and ft:extract functions; are they helpful a bit?

Christian

[1] http://docs.basex.org/wiki/Full-Text_Module#ft:mark


On Wed, Nov 26, 2014 at 12:35 PM, Javier Couto
<javier.couto.fr@gmail.com> wrote:
Hi,

Sorry if this is too basic, but I’m trying to get the positions of the
matched tokens in a full-text query, and I can’t find the way to do it. I
imagine something like:

for $sentence in //sentence
where $sentence[text() contains text { ‘DNA', ‘oxidation' }]
return <positions>ft:SOME-FUNCTION-FOR-TOKENS-POSITIONS($sentence[text()
contains text { ‘DNA', ‘oxidation' }])</positions>

Is this possible?

Thank you in advance,

Javier