Take a look at exist-Stanford-nlp in my GitHub. Take a look at the code for the named entity recognition 

https://github.com/lcahlander/exist-stanford-nlp/blob/master/src/main/xquery/ner-module.xqm


Loren Cahlander

Sent from my iPhone

On May 10, 2020, at 10:13 AM, Graydon <graydonish@gmail.com> wrote:

On Sun, May 10, 2020 at 03:35:45AM -0400, Liam R. E. Quin scripsit:
On Fri, 2020-05-08 at 14:52 -0400, Graydon Saunders wrote:
The idea would be to iterate through the list, marking up the node
with any matches.

Can you instead use standoff markup? E.g. store positions of start and
end as word counts, and then merge them later?

In principle, yes.  But then I would have to be smart and extract the
positions correctly somehow and then get all the positional arithmetic
correct.

The attraction of the full-text index was a combination of speed and
being able to let some other smarter person handle the "does the match
still work if there's a line break? bunches of tabs?" issues.

I now think this just isn't a full-text use case; I was trying to think
of a way to use something optimized for single-pass search to support
recursion on the changed content and that loses all the attractive
optimizations.  Nothing says I can't use analyze-string and recursion.

Thanks!

-- Graydon