Hello Graydon,
These blogposts discuss various algorithms to find near-duplicate documents, performance, and xquery (marklogic dialect) implementations :
https://stuartmyles.blogspot.com/2012/10/longest-common-substring-in-xquery-... https://stuartmyles.blogspot.com/2012/10/longest-common-substring-in-xquery-...
depending on your constraints, maybe some ideas could help ?
Victor
Le 12/11/2020 à 00:00, Graydon Saunders a écrit :
Hello --
Is there some way to assign the abstraction of a fuzzy match to a variable, so that something like
for $x in //p let $key := get-fuzzy-match-value($x) group by $key return <similar-paragraphs>{$x}</similar-paragraphs>
would be possible?
I'm supposing this is one of those things that's either easy or impossible.
Thanks! Graydon