Hi,
According to my copy of BaseX 10.7,
string:levenshtein('oil field', 'oilfield')
and
string:levenshtein('oil field', 'coalfield')
both return the same value, 0.7777777777777778.
My understanding is that the Levenshtein-Damerau distance between 'oil field' and 'oilfield' is 1 and between 'oil field' and 'coalfield' is 3, so following the formula from https://docs.basex.org/wiki/String_Module#string:levenshtein
1.0 – distance / max(length of strings)
should give 0.888... and 0.666... respectively.
Am I off-base here or is there something awry with string:levenshtein?
Cheers,
Jack
Hi Hack,
That’s been helpful, thanks. We’ve aligned our Damerau/Levenshtein algorithms, the latest version should behave as expected [1, 2].
Best, Christian
[1] https://files.basex.org/releases/latest/ [2] https://github.com/BaseXdb/basex/commit/6889ac108c6b32d448d640d53ec098bbb893...
On Thu, May 9, 2024 at 8:29 AM Jack Steyn steynjack@gmail.com wrote:
Hi,
According to my copy of BaseX 10.7,
string:levenshtein('oil field', 'oilfield')
and
string:levenshtein('oil field', 'coalfield')
both return the same value, 0.7777777777777778.
My understanding is that the Levenshtein-Damerau distance between 'oil field' and 'oilfield' is 1 and between 'oil field' and 'coalfield' is 3, so following the formula from https://docs.basex.org/wiki/String_Module#string:levenshtein
1.0 – distance / max(length of strings)
should give 0.888... and 0.666... respectively.
Am I off-base here or is there something awry with string:levenshtein?
Cheers,
Jack
Thanks, Christian.
When I download and unzip the latest version, start the HTTP server and navigate to localhost:8080, I'm given the following error:
Stopped at [...]/basex/webapp/dba/jobs/job-result.xqm, 34/49: [XPST0017] Unknown function: string:nl.
It's easy enough to work around by editing job-result.xqm (and jobs.xqm in which string:nl also appears), but wanted to bring it to your attention in case you weren't aware.
Cheers,
Jack
On Fri, 10 May 2024, 7:37 pm Christian Grün, christian.gruen@gmail.com wrote:
Hi Hack,
That’s been helpful, thanks. We’ve aligned our Damerau/Levenshtein algorithms, the latest version should behave as expected [1, 2].
Best, Christian
[1] https://files.basex.org/releases/latest/ [2] https://github.com/BaseXdb/basex/commit/6889ac108c6b32d448d640d53ec098bbb893...
On Thu, May 9, 2024 at 8:29 AM Jack Steyn steynjack@gmail.com wrote:
Hi,
According to my copy of BaseX 10.7,
string:levenshtein('oil field', 'oilfield')
and
string:levenshtein('oil field', 'coalfield')
both return the same value, 0.7777777777777778.
My understanding is that the Levenshtein-Damerau distance between 'oil field' and 'oilfield' is 1 and between 'oil field' and 'coalfield' is 3, so following the formula from https://docs.basex.org/wiki/String_Module#string:levenshtein
1.0 – distance / max(length of strings)
should give 0.888... and 0.666... respectively.
Am I off-base here or is there something awry with string:levenshtein?
Cheers,
Jack
Thanks for the hint, the DBA code has been updated (the function is being replaced with the new fn:char('\n') function). A new snapshot is online.
On Mon, May 13, 2024 at 8:12 AM Jack Steyn steynjack@gmail.com wrote:
Thanks, Christian.
When I download and unzip the latest version, start the HTTP server and navigate to localhost:8080, I'm given the following error:
Stopped at [...]/basex/webapp/dba/jobs/job-result.xqm, 34/49: [XPST0017] Unknown function: string:nl.
It's easy enough to work around by editing job-result.xqm (and jobs.xqm in which string:nl also appears), but wanted to bring it to your attention in case you weren't aware.
Cheers,
Jack
On Fri, 10 May 2024, 7:37 pm Christian Grün, christian.gruen@gmail.com wrote:
Hi Hack,
That’s been helpful, thanks. We’ve aligned our Damerau/Levenshtein algorithms, the latest version should behave as expected [1, 2].
Best, Christian
[1] https://files.basex.org/releases/latest/ [2] https://github.com/BaseXdb/basex/commit/6889ac108c6b32d448d640d53ec098bbb893...
On Thu, May 9, 2024 at 8:29 AM Jack Steyn steynjack@gmail.com wrote:
Hi,
According to my copy of BaseX 10.7,
string:levenshtein('oil field', 'oilfield')
and
string:levenshtein('oil field', 'coalfield')
both return the same value, 0.7777777777777778.
My understanding is that the Levenshtein-Damerau distance between 'oil field' and 'oilfield' is 1 and between 'oil field' and 'coalfield' is 3, so following the formula from https://docs.basex.org/wiki/String_Module#string:levenshtein
1.0 – distance / max(length of strings)
should give 0.888... and 0.666... respectively.
Am I off-base here or is there something awry with string:levenshtein?
Cheers,
Jack
basex-talk@mailman.uni-konstanz.de