On Tue, Jan 25, 2011 at 3:36 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Do you have a reproducible example for this case?

It wouldn't be feasible to provide the actual full example since the query is against a multi-gigabyte database, but the query I'm experimenting with is:

let $section:=db:open('CIVWAR')//book[@id='116']//section[@id='31']
let $extracts:=ft:extract($section/*[text() contains text "cincinnati"],'mark',80)
return for $e in $extracts return <frag id="{db:node-id($e)}">{$e}</frag>

This generates:

<frag id="0">
  <para role="or_body_normal">... see by a column from the <mark>Cincinnati</mark> Commercial what a wide feeling has been awa...</para>
</frag>
<frag id="0">
  <para role="or_body_loc_time">
    <mark>CINCINNATI</mark>, OHIO,
  </para>
</frag>

I now think that the issue is that ft:extract loses the actual node identity. I think it would be more logical if it retained it, if that was by design, even though the results are not the same value as the original node. One idea I wanted to explore was to return little snippets of context for search hits (which indeed is the purpose of ft:extract) and then retain an absolute node reference to be able to rapidly work with that part of the document.

I did try a separate experiment to get a usable id with another query which does not use ft:extract, and then tried retrieving the node by using a predicate such as [db:node-id(.)=123456].  This was very slow, over 2 seconds. Then I saw the specialized db:open-id() function for just that purpose which executes in milliseconds - a cautionary note to anybody trying the same thing. If it's straightforward I suggest applying an appropriate index to make such a resolution speedy even with the predicate selection, because I could see that potentially being more desirable to use in some cases.