Hi Eliot,
I (am sorry to) agree there is no straightforward solution to speed up the lookup of single tokens in attributes. XQuery 3.1 provides a new string function "contains-token" [1]...
//*[contains-token(@class, 'topic/topic')]
...but (up to now) it is not index-driven in BaseX.
Some users would love to see us extend our full-text index to attributes. This way, queries your could be sped as follows:
//*[@class contains text 'topic/topic'][contains-token(@class, 'topic/topic')]
The second predicate is still required, as the full-text query would also potentially yield hits like "topic topic" or "ToPiC-!-tOpIc".
Currently, an efficient and (if you get used to it) rather simple way out is to create your own index...
let $index := <index>{ for $element in db:open('db')//*[@class] let $id := db:node-id($element) for $token in $element/@class/tokenize(., '\s+') return <class token="{ $token }">{ $id }</class> }</index> return db:create('index', $index, 'index.xml')
...and access it in the next step:
for $id in db:open('index')//class[@token = 'topic/topic'] return db:open-id('db', $id)
Hope this helps, Christian
[1] http://docs.basex.org/wiki/XQuery_3.1#fn:contains-token
On Mon, Apr 13, 2015 at 7:38 PM, Eliot Kimber ekimber@contrext.com wrote:
DITA defines the notion of layered hierarchy of element types, where every DITA-defined element is either a base type or a "specialized" type derived from some base type. The type hierarchy of each element is specified by a @class attribute that lists the ancestry and leaf type of the element.
For example, the element type "concept" is a specialization of the base type "topic" and so has a @class value of "- topic/topic concept/concept ". Each blank-delimited term is a module name/element name pair.
Processing in DITA is "specialization aware" if selection of elements is in terms of a @class token rather than concrete element type. For example, you might apply processing to topics of any type by matching on "*[contains(@class, ' topic/topic ')]", which will match all DITA topics, regardless of their specialized type.
The challenge this presents in a database context is optimizing finding of things based on these @class values. For large repositories an XQuery like "//*[contains(@class, ' topic/topic ')]" is going to be quite slow as it requires a string comparison of every @class value. Even if there is an attribute value index it will still be slow.
The obvious solution would be to index by @class token, e.g., an index where keys are "topic/topic", "topic/p", etc.
Is there a way to construct such an index in BaseX? Is there a better to address type of string-match-based lookup?
Thanks,
Eliot
————— Eliot Kimber, Owner Contrext, LLC http://contrext.com