Unique node identifier?

List overview All Threads
Download

newer

older

Options to convert coordinates...

Re: [basex-talk] Strange query...

NewIntellectual

25 Jan 2011 25 Jan '11

8:27 p.m.

Is there a function to retrieve a persistent unique node identifier from any given retrieved node? I thought it would be db:node-id($node) but that is returning '0'.

Attachments:

attachment.html (text/html — 175 bytes)

Show replies by date

Christian Grün

25 Jan 25 Jan

8:36 p.m.

On Tue, Jan 25, 2011 at 9:27 PM, NewIntellectual newintellectual@gmail.com wrote:

...

Is there a function to retrieve a persistent unique node identifier from any given retrieved node? I thought it would be db:node-id($node)

Exactly, this is the function to be called. The returned id should be unique for all nodes of a document..

...

but that is returning '0'.

Do you have a reproducible example for this case? The following BaseX call should return "2" for the text node:

basex -c "create db test <x>text</x>; xquery db:node-id(//text())"

Christian

...

BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

NewIntellectual

9:02 p.m.

On Tue, Jan 25, 2011 at 3:36 PM, Christian Grün christian.gruen@gmail.comwrote:

...

Do you have a reproducible example for this case?

It wouldn't be feasible to provide the actual full example since the query is against a multi-gigabyte database, but the query I'm experimenting with is:

let $section:=db:open('CIVWAR')//book[@id='116']//section[@id='31'] let $extracts:=ft:extract($section/*[text() contains text "cincinnati"],'mark',80) return for $e in $extracts return <frag id="{db:node-id($e)}">{$e}</frag>

This generates:

<frag id="0"> <para role="or_body_normal">... see by a column from the Cincinnati Commercial what a wide feeling has been awa...</para> </frag> <frag id="0"> <para role="or_body_loc_time"> CINCINNATI, OHIO, </para> </frag>

I now think that the issue is that ft:extract loses the actual node identity. I think it would be more logical if it retained it, if that was by design, even though the results are not the same value as the original node. One idea I wanted to explore was to return little snippets of context for search hits (which indeed is the purpose of ft:extract) and then retain an absolute node reference to be able to rapidly work with that part of the document.

I did try a separate experiment to get a usable id with another query which does not use ft:extract, and then tried retrieving the node by using a predicate such as [db:node-id(.)=123456]. This was very slow, over 2 seconds. Then I saw the specialized db:open-id() function for just that purpose which executes in milliseconds - a cautionary note to anybody trying the same thing. If it's straightforward I suggest applying an appropriate index to make such a resolution speedy even with the predicate selection, because I could see that potentially being more desirable to use in some cases.

Christian Grün

9:28 p.m.

…Phil, thanks for all the details. As you mentioned already, the ft:extract() function is reponsible for the loss of the original id: it creates new XML fragments. Those are internally represented as a new (tiny) main-memory database instance, which use their own numbering scheme. You could try to remember the node id before calling ft:extract, similar to the following example (haven't tried it live, so I hope the syntax is correct):

for $node in //* for $hit in ft:extract($node[text() contains text "cincinnati"]) return <hit id="{ db:node-id($node) }">{ $hit }</hit>

Regarding our new functions db:node-id() and db:open-id(), which are found at..

http://docs.basex.org/wiki/Database_Functions

It's a good hint that the existing documentation is not verbose enough yet. We've just opened our Wiki for everyone, so everybody's input is welcome ;)

Christian

...

It wouldn't be feasible to provide the actual full example since the query is against a multi-gigabyte database, but the query I'm experimenting with is: let $section:=db:open('CIVWAR')//book[@id='116']//section[@id='31'] let $extracts:=ft:extract($section/*[text() contains text "cincinnati"],'mark',80) return for $e in $extracts return <frag id="{db:node-id($e)}">{$e}</frag> This generates:

<frag id="0"> <para role="or_body_normal">... see by a column from the Cincinnati Commercial what a wide feeling has been awa...</para> </frag> <frag id="0"> <para role="or_body_loc_time"> CINCINNATI, OHIO, </para> </frag> I now think that the issue is that ft:extract loses the actual node identity. I think it would be more logical if it retained it, if that was by design, even though the results are not the same value as the original node. One idea I wanted to explore was to return little snippets of context for search hits (which indeed is the purpose of ft:extract) and then retain an absolute node reference to be able to rapidly work with that part of the document. I did try a separate experiment to get a usable id with another query which does not use ft:extract, and then tried retrieving the node by using a predicate such as [db:node-id(.)=123456]. This was very slow, over 2 seconds. Then I saw the specialized db:open-id() function for just that purpose which executes in milliseconds - a cautionary note to anybody trying the same thing. If it's straightforward I suggest applying an appropriate index to make such a resolution speedy even with the predicate selection, because I could see that potentially being more desirable to use in some cases.

5438

Age (days ago)

5438

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

3 comments

2 participants

tags (0)

participants (2)

Christian Grün
NewIntellectual