Hi Christian --

Thank you!  that helps a lot.   I can maintain that, or rather, I won't have to maintain that, it's general enough to keep working.

Much appreciated!
Graydon

On Sun, Apr 26, 2020 at 4:56 AM Christian Grün <christian.gruen@gmail.com> wrote:
Hi Graydon,

It’s a good idea to use the window clause (as the number of mark
elements that need to be joined is not known in advance). You can use
ft:tokenize to include other delimiters:

for $term in ('Diverse and various', 'words… some', 'glossary-terms')
for $ft in ft:mark(db:open('DB')//*[text() contains text { $term }])
return element { name($ft) } {
  $ft/@*,
  for tumbling window $w in $ft/node()
  start when true()
  end $e next $enext when (
    $enext[not(self::mark)] and $enext[exists(ft:tokenize(.))] or
    $enext[self::mark] and $e[exists(ft:tokenize(.))]
  )
  return if ($w[self::mark]) then <mark>{ string-join($w) }</mark> else $w
}

If you don’t want to rebuild your original node, you can also use the
'update' expression and modify your existing document. I have slightly
rewritten the original code, but the basic idea is the same:

for $term in ('Diverse and various', 'words… some', 'glossary-terms')
for $ft in ft:mark(db:open('DB')//*[text() contains text { $term }])
return $ft update {
  for tumbling window $w in node()
  start $s when $s/self::mark
  end $curr next $next when (
    exists(ft:tokenize($curr)) and exists($next/self::mark) or
    exists(ft:tokenize($next)) and empty ($next/self::mark)
  )
  return (
    replace node head($w) with element mark { string-join($w) },
    delete nodes tail($w)
  )
}

Hope this helps,
Christian







On Sun, Apr 26, 2020 at 6:04 AM Graydon <graydonish@gmail.com> wrote:
>
> On Sat, Apr 25, 2020 at 06:02:14PM -0400, Liam R. E. Quin scripsit:
> > On Sat, 2020-04-25 at 13:46 -0400, Graydon Saunders wrote:
> > > I think I have figured out a way to connect the adjacent marked
> > > words in the phrasal term into a single mark element. I cannot
> > > convince myself that this is the right way; is there a better
> > > approach than tumbling windows?
> >
> > I just search for the multi-word phrase and surround that. Enclosed is
> > a sample from a prototype for a keyword in context search index for
> > fromoldbooks.org (not yet live). Lookognow i see it's not very neat
> > but maybe it'll give some ideas.
>
> It does, but alas I can't use string-join.  Some of the terms have
> hyphens, so I'm getting <mark>A</mark>-<mark>List</mark> coming out of
> the full text search, which must become <mark>A-List</mark>.  Plus some
> of the terms have the form "nine-pence and six-pence", so any solution
> has to be general for interstitial text nodes.
>
> (I can't rule out any punctuation.  I know there are hyphens, but don't
> know there are ONLY hyphens.)
>
> Thanks!
>
> Graydon