- I suspect that programmers in XQuery FT will not like to rewrite their
query until it works. ... I don't see a way to rewrite using text() so that it works in the general case.
Note that all XQuery Full Text queries "work" in BaseX, but not all of them take advantage of the optional full-text index. The reason is that we initially put most effort on a 100% compliance with the XQFT specification – and, to the best of our knowledge, we are still the only implementation that complies 100% with the specs (other implementations are coming closer, though) – and we are gradually increasing the number of XQuery expressions that are recognized by the query optimizer.
I have the feeling that currently, BaseX cannot match a FT query accross several text() nodes, am I wrong?
...they won't utilize the index.
Sorry, I am confused. Why do you speak of 'atomization' ? I really think that all implementations should recognize "romeo" and "juliet" as independent words in Shakespeare's plays...
By default, whitespace nodes are chopped by the BaseX XML parser; that's why snippets like...
<SPEAKER>ROMEO</SPEAKER><LINE>Is the day so young?</LINE>
..are tokenized to "romeois", "the", "day", etc. This may look pretty weird, but it makes sense if you look at examples like..
"<b>T</b>his is funny" contains text "This is funny"
..which will return "false" in some other implementations. Both approaches are correct, as the specification says that "Implementations are free to provide implementation-defined ways to differentiate between the markup's effect on token boundaries during tokenization" (http://www.w3.org/TR/2010/CR-xpath-full-text-10-20100128/#tq-ftsearch-xml).
Feel free to ask for more, Christian