Dimitar:
Thanks for this extensive explanation, most interesting. Very much appreciated.
best
Pascal

On 11/28/11 11:47 AM, Dimitar Popov wrote:
Hi Pascal,

Dimitar:
Thanks for the clarification. Exact matches would solve some use cases but I would however welcome full text search on attributes as I often need to perform partial matches. I assume this is a fairly common need.

The XQuery Full-Text specification [1] has been designed for the purpose of searching keywords and phrases in large text corpora, and not substring or pattern matching. Central concept of full-text search is tokenization, i.e. splitting searched and matched text into tokens. This is why, although it is possible to use full-text search to a certain extent for inexact string matching, the results may not be what one expects.

I know that inexact matching is relatively common, but I'm afraid I'm not aware of a DBMS which has a general purpose index structure which can speed up pattern matching, besides the classical case of prefix matching (e.g. SQL queries with LIKE 'abc%' conditions).

Concrete to your examples:

<Foo id="" version="1.0>
<Foo id="" version="1.1>
<Foo id="" version="2.0>
--> Search for all Foo under version 1.*

This will not work because of tokenization: consider the case of a version which looks like this "2.1.2" - it will be matched by the full-text search, although it's not what you want.

<Book author="John Doe">
<Book author="Jane Doe">
--> Search by author name

It's safe to use full-text search in this case.

<variable name="xyz_1">
<variable name="xyz_2">
--> Search  all variables that start with "xyz"

Same with the version: e.g. "1_xyz_2" will be matched, and there is no way specified how to denote the string beginning (i.e. full-text search != regex matching).

<a  href="http://www.example.org/home">
<a  href="http://www.basex.org">
<a  href="http://www.example.org/acbout">
--> Find all links pointing to example.org

You can't use full-text search in this case, because "example.org" will match for example "example/org", too. Of course if you are willing to take the risk of having false matches, you can.

I hope my comments will be useful and that I've convinced you that ft search is not what you need :)

Regards,
Dimitar

[1] http://www.w3.org/TR/xpath-full-text-10/