Re: [basex-talk] Full-text speed

11 Feb 2010

      ...

I suspect that programmers in XQuery FT will not like to rewrite their

query until it works.
...
I don't see a way to rewrite using text() so that it works in the general
case.
Note that all XQuery Full Text queries "work" in BaseX, but not all of
them take advantage of the optional full-text index. The reason is
that we initially put most effort on a 100% compliance with the XQFT
specification – and, to the best of our knowledge, we are still the
only implementation that complies 100% with the specs (other
implementations are coming closer, though) – and we are gradually
increasing the number of XQuery expressions that are recognized by the
query optimizer.
...
I have the feeling that currently, BaseX cannot match a FT query accross
several text() nodes, am I wrong?
...they won't utilize the index.
...
Sorry, I am confused. Why do you speak of 'atomization' ?
I really think that all implementations should recognize "romeo" and
"juliet" as independent words in Shakespeare's plays...
By default, whitespace nodes are chopped by the BaseX XML parser;
that's why snippets like...
<SPEAKER>ROMEO</SPEAKER><LINE>Is the day so young?</LINE>
..are tokenized to "romeois", "the", "day", etc. This may look pretty
weird, but it makes sense if you look at examples like..
"<b>T</b>his is funny" contains text "This is funny"
..which will return "false" in some other implementations. Both
approaches are correct, as the specification says that
"Implementations are free to provide implementation-defined ways to
differentiate between the markup's effect on token boundaries during
tokenization" (http://www.w3.org/TR/2010/CR-xpath-full-text-10-20100128/#tq-ftsearch-xml).
Feel free to ask for more,
Christian

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Full-text speed