On Thu, Nov 12, 2020 at 11:58:29AM +0100, Christian GrĂ¼n scripsit:
Gerrit has already mentioned fingerprinting techniques. If your time is limited, it may be sufficient to apply full-text tokenization and Soundex to your strings:
let $get-fuzzy-match-value := function($x) { $x => ft:tokenize(map { 'stemming': true() }) => distinct-values() => string-join() => strings:soundex() } for $x in //p group by $key := $get-fuzzy-match-value($x) return <similar-paragraphs key='{ $key }'>{ $x }</similar-paragraphs>
I shall certainly give this a try!
Thank you, Christian! I continue to be astonished by the power and utility of this tool you've built.
-- Graydon