Dimitar:
Noticed that you mentioned that BaseX does not text index attribute. Is this something that could be added as an indexing option? The two core metadata standards I work with store names and identification information in element attributes and I was hoping to leverage FT search for quick lookup purposes.
Otherwise are attribute values always indexed? For example, if I need to look for a unique key like <element urn='some-unique-urn-string-value'>, would I get an instant match? What about composite keys like <element id='id1234' version='1.0.0' agency='myagency'>?
best
*P

On 11/27/11 2:49 PM, Dimitar Popov wrote:

Hi An,

 

thank you for the provided data and sample query. Please, check my comments, below.

 

Am Sonntag, 27. November 2011, 17:33:00 schrieb Truong An Nguyen:

> declare default element namespace "http://iso.org/OTX";

>

> for $pro in collection()/otx/procedures/procedure

> return for $hd in $pro/realisation/flow//handler

> where exists($hd/@*[contains(data(.),"Variable1")])

> or

> exists($hd/realisation/catch/exception//@*[contains(data(.),"Variable1")])

> or $hd/specification contains text "Specification"

> (: or exists ($hd/specification[contains(data(.),"Specification")] ):)

> return

> concat(data($pro/../../@package),":",data($pro/../../@name),":",data($pro/@n

> ame),":","handler",":",$hd/@id)

>

> The variant with "contains text" ran much slower than the variant with

> "contains".

 

Hm, on my computer the difference is not huge (1307.42 ms for fn:contains() vs. 1446.64 ms for "contains text"), but, yes, "slow" is a relative term :)

 

Anyway, the difference is due to the fact, that while fn:contains() does simple sub-string search, "contains text" offers more advanced options such as case insensitivity, stemming, stop words, etc. Thus, when the full-text index is not used, there is some more processing of both the query string as well as the matched string, which results the slower performance.

 

> The indexes are used: path, text index, attribute index, full-text index

> (without any options)

 

With the provided query, the full-text index is not used. The reason for this, is that BaseX does not index the string values of attributes, i.e. only text nodes are indexed.

 

I don't know what the query should do, but please note the different behavior of fn:contains() and contains text. Just a quick example:

 

fn:contains('GlobalDocumentVariable1_String', 'Variable1') -> true

'GlobalDocumentVariable1_String' contains text 'Variable1' -> false

 

Further, one small optimization would be to remove the data() function call in the predicates, i.e.

 

$hd/realisation/catch/exception//@*[contains(.,"Variable1")]

 

is enough.

 

I hope this helps.

 

Greetings,

Dimitar



_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk