Useful keywords; thank you!

Also more of a development effort than this project will support, alas.  (Unless someone's willing to provide a pointer to their public release of such a solution, free for commercial use?  Which doesn't seem a whole lot more likely than someone throwing a gold brick through my window.)

On Wed, Nov 11, 2020 at 6:42 PM Imsieke, Gerrit, le-tex <gerrit.imsieke@le-tex.de> wrote:
This is probably difficult since in BaseX, fuzzy matching is implemented
using the Levenshtein distance between two strings [1]. Therefore
similarity is a relation between pairs of paragraphs rather than an
intrinsic property of an individual paragraph.

You should look for content fingerprinting/clustering techniques.

[1] https://docs.basex.org/wiki/Full-Text#Fuzzy_Querying


On 12.11.2020 00:00, Graydon Saunders wrote:
> Hello --
>
> Is there some way to assign the abstraction of a fuzzy match to a
> variable, so that something like
>
> for $x in //p
>    let $key := get-fuzzy-match-value($x)
>    group by $key
>    return <similar-paragraphs>{$x}</similar-paragraphs>
>
> would be possible?
>
> I'm supposing this is one of those things that's either easy or impossible.
>
> Thanks!
> Graydon