Hi,
I'd like to request a feature concerning the ft:search method: An `lserror` option that works exactly like the global option `LSERROR`. It would be nice if we could set the maximum Levenshtein distance specifically for each fuzzy search without having to adjust the global option.
Another question on ft:search: Is there a reason why it doesn't have a `case` option just like ft:contains has one?
Best regards, Sebastian
Hi,
since I haven't gotten a response to this inquiry yet, I just wanted to ask if this is a reasonable feature request.
Best, Sebastian
Am 01.03.2019 um 14:43 schrieb Sebastian Zimmer:
Hi,
I'd like to request a feature concerning the ft:search method: An `lserror` option that works exactly like the global option `LSERROR`. It would be nice if we could set the maximum Levenshtein distance specifically for each fuzzy search without having to adjust the global option.
Another question on ft:search: Is there a reason why it doesn't have a `case` option just like ft:contains has one?
Best regards, Sebastian
Hi Sebastian,
sorry for letting you wait (lots to do).
I'd like to request a feature concerning the ft:search method: An `lserror` option that works exactly like the global option `LSERROR`. It would be nice if we could set the maximum Levenshtein distance specifically for each fuzzy search without having to adjust the global option.
Good idea. As fuzzy searching is a non-standard feature, there is currently no syntactical construct to define LSERROR via "contains text". I think it would make sense to first extend the standard syntax, and provide an ft:search option for lserror just after that.
We could extend the syntax as follows:
"A" contains text "B" using fuzzy 3 errors
I’ve added an issue for that [1].
Another question on ft:search: Is there a reason why it doesn't have a `case` option just like ft:contains has one?
The reason is that all your data will be indexed with static options for case, diacritics, etc. If case has been considered while building the index store, and if you search for "A", it won’t return hits for "a".
Things are different for ft:contains, as it is not based on the index, and will always tokenize the given input on the fly.
If you decide to ignore case in the index, you can post-process your index results as follows:
let $query := 'search-term' for $result in ft:search('db', $query) where ft:contains($result, $query, map { 'case': 'sensitive' }) return $result
As you can guess, this check might take some time, so just be careful if your query might generate lots of hits.
Best, Christian
Hi Sebastian,
It’s finally possible to adjust user-defined Levenshtein error values via “contains text”, ft:search and ft:contains.
Hope this still helps (provided that you still remember your feature request), Christian
[1] https://github.com/BaseXdb/basex/issues/1673
On Mon, Mar 25, 2019 at 1:33 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Sebastian,
sorry for letting you wait (lots to do).
I'd like to request a feature concerning the ft:search method: An `lserror` option that works exactly like the global option `LSERROR`. It would be nice if we could set the maximum Levenshtein distance specifically for each fuzzy search without having to adjust the global option.
Good idea. As fuzzy searching is a non-standard feature, there is currently no syntactical construct to define LSERROR via "contains text". I think it would make sense to first extend the standard syntax, and provide an ft:search option for lserror just after that.
We could extend the syntax as follows:
"A" contains text "B" using fuzzy 3 errors
I’ve added an issue for that [1].
Another question on ft:search: Is there a reason why it doesn't have a `case` option just like ft:contains has one?
The reason is that all your data will be indexed with static options for case, diacritics, etc. If case has been considered while building the index store, and if you search for "A", it won’t return hits for "a".
Things are different for ft:contains, as it is not based on the index, and will always tokenize the given input on the fly.
If you decide to ignore case in the index, you can post-process your index results as follows:
let $query := 'search-term' for $result in ft:search('db', $query) where ft:contains($result, $query, map { 'case': 'sensitive' }) return $result
As you can guess, this check might take some time, so just be careful if your query might generate lots of hits.
Best, Christian
basex-talk@mailman.uni-konstanz.de