Dimitar: Thanks for the clarification. Exact matches would solve some use cases but I would however welcome full text search on attributes as I often need to perform partial matches. I assume this is a fairly common need. A few examples are illustrated below. Is this something that could potentially be supported (as an FT indexing option)? best Pascal
<Foo id="" version="1.0> <Foo id="" version="1.1> <Foo id="" version="2.0> --> Search for all Foo under version 1.*
<Book author="John Doe"> <Book author="Jane Doe"> --> Search by author name
<variable name="xyz_1"> <variable name="xyz_2"> --> Search all variables that start with "xyz"
<a href="http://www.example.org/home"> <a href="http://www.basex.org"> <a href="http://www.example.org/acbout"> --> Find all links pointing to example.org
On 11/28/11 9:26 AM, Dimitar Popov wrote:
Hi Pascal,
I meant that attribute values are not index by the full-text index. However, there is a separate index (non-full-text) which contains only the attribute values and can be used to speed up attribute value queries with the equal operator ("=" not "eq"!). Thus, the index will be used when searching for unique ids.
When searching using several attributes, then I guess you'll use something like @id1 = 'x' and @id2 = 'y' In this case, BaseX will use the index only for one of the attributes, and the other predicate will be evaluated iteratively. This is the common way in most database systems.
If you use "or", e.g. @id1 = 'x' or @id2 = 'y' then both predicates will be evaluated using the index.
For more details about different index types, please, check our wiki page [1].
Regards, Dimitar
[1] http://docs.basex.org/wiki/Indexes
On Nov 28, 2011, at 4:51 AM, Pascal Heus wrote:
Dimitar: Noticed that you mentioned that BaseX does not text index attribute. Is this something that could be added as an indexing option? The two core metadata standards I work with store names and identification information in element attributes and I was hoping to leverage FT search for quick lookup purposes. Otherwise are attribute values always indexed? For example, if I need to look for a unique key like <element urn='some-unique-urn-string-value'>, would I get an instant match? What about composite keys like <element id='id1234' version='1.0.0' agency='myagency'>? best *P
On 11/27/11 2:49 PM, Dimitar Popov wrote:
Hi An,
thank you for the provided data and sample query. Please, check my comments, below.
Am Sonntag, 27. November 2011, 17:33:00 schrieb Truong An Nguyen:
declare default element namespace "http://iso.org/OTX";
for $pro in collection()/otx/procedures/procedure return for $hd in $pro/realisation/flow//handler where exists($hd/@*[contains(data(.),"Variable1")]) or
exists($hd/realisation/catch/exception//@*[contains(data(.),"Variable1")])
or $hd/specification contains text "Specification" (: or exists ($hd/specification[contains(data(.),"Specification")] ):) return
concat(data($pro/../../@package),":",data($pro/../../@name),":",data($pro/@n
ame),":","handler",":",$hd/@id)
The variant with "contains text" ran much slower than the variant with "contains".
Hm, on my computer the difference is not huge (1307.42 ms for fn:contains() vs. 1446.64 ms for "contains text"), but, yes, "slow" is a relative term :)
Anyway, the difference is due to the fact, that while fn:contains() does simple sub-string search, "contains text" offers more advanced options such as case insensitivity, stemming, stop words, etc. Thus, when the full-text index is not used, there is some more processing of both the query string as well as the matched string, which results the slower performance.
The indexes are used: path, text index, attribute index, full-text
index
(without any options)
With the provided query, the full-text index is not used. The reason for this, is that BaseX does not index the string values of attributes, i.e. only text nodes are indexed.
I don't know what the query should do, but please note the different behavior of fn:contains() and contains text. Just a quick example:
fn:contains('GlobalDocumentVariable1_String', 'Variable1') -> true 'GlobalDocumentVariable1_String' contains text 'Variable1' -> false
Further, one small optimization would be to remove the data() function call in the predicates, i.e.
$hd/realisation/catch/exception//@*[contains(.,"Variable1")]
is enough.
I hope this helps.
Greetings, Dimitar
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de <mailto:BaseX-Talk@mailman.uni-konstanz.de> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk