In my content set (DITA maps and topics) I construct an index that maps each map or topic to the names of the root maps that ultimately use that topic. My index structure is:
<doc-to-bundle-index> <doc-to-bundle-index-entry key="product/customer-communities/reference/gamification-components-badges.dita"> <filename>gamification-components-badges.dita</filename> <bundles> <bundle>No-bundle-found</bundle> </bundles> </doc-to-bundle-index-entry> </doc-to-bundle-index>
I then want to get, for all the topics, the bundle names for each topic, grouped by bundle name (i.e., construct a map of bundle names to topics in that bundle). (This is in the service of a report that relates Oxygen map validation reports to the documents associated with the incidents in the report, grouped by bundle.)
I have 10K topics in my test set.
Getting the set of topic elements and the index keys for each topic is fast: about 0.1 seconds total.
However, using the keys to do a lookup of the bundles for each topic takes about 2 minutes, i.e.:
let $bundlesForDocs as xs:string* := for $key in $keysForDocs return $dtbIndex/doc-to-bundle-index-entry[@key eq $key]/bundles/bundle ! string(.) return $bundlesForDocs
(I would really be building a map of bundles-to-docs but I used this loop just to gather timing info and take map construction out of the equation, not that I would expect map construction itself to be slow.)
An obvious solution would be to capture the bundle-to-document mapping at the time I construct the index, which I will do.
But my larger question is:
Am I doing anything wrong or inefficient in this initial approach that is making this lookup of index entries by key slower than it should be? Or is this just an inherently slow operation that I should just not try to do if at all possible?
That is, is there a way to either construct the content of the index or configure BaseX that will make this kind of bulk lookup faster?
Or am I thinking about this particular use case all wrong?
Thanks,
Eliot _____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Twitterhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Facebookhttps://www.facebook.com/servicenow
for $key in $keysForDocs return $dtbIndex/doc-to-bundle-index-entry[@key eq $key]/bundles/bundle ! string(.)
You can probably save time by omitting the loop:
$dtbIndex/doc-to-bundle-index-entry [@key = $keysForDocs]/bundles/bundle ! string(.)
Did you check if $dtbIndex is inlined at compile time?
Follow up: I wrote a function to construct the bundle-to-docs index as an element. That function, operating over the previously-construct doc-to-bundle index document, takes 0.2 seconds to run!
So it seems like the answer is to build the index you need when you need it (and then persist or not depending on how dynamic your data is) rather than trying to do a relational-style lookup against the first document.
This is the function that builds the index. It doesn’t do any lookups, just iterates over the entries in the doc-to-bundle index, which is very fast:
declare function linkrk:constructBundleToDocsIndex($database as xs:string) as element(bundle-to-docs-index) { let $lrcDatabase := linkrk:getRecordKeepingDbName($database) let $dtbIndex := collection($lrcDatabase)/doc-to-bundle-index let $bundlesToIndexKey as map(*) := map:merge( for $entry in $dtbIndex/doc-to-bundle-index-entry let $bundles as xs:string* := $entry/bundles/bundle ! string(.) for $bundle in $bundles return map{ $bundle : string($entry/@key)} , map{'duplicates' : 'combine'} ) let $index as element(bundle-to-docs-index) := element{'bundle-to-docs-index'} { for $bundle in map:keys($bundlesToIndexKey) return element {'bundle-to-docs-index-entry'} { attribute{'bundle'}{$bundle}, for $key in $bundlesToIndexKey($bundle) return element{'doc-key'}{$key} } } return $index };
_____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Twitterhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Facebookhttps://www.facebook.com/servicenow
From: Christian Grün christian.gruen@gmail.com Date: Thursday, February 3, 2022 at 8:11 AM To: Eliot Kimber eliot.kimber@servicenow.com Cc: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Optimizing Lookup from Custom Indexes [External Email]
for $key in $keysForDocs return $dtbIndex/doc-to-bundle-index-entry[@key eq $key]/bundles/bundle ! string(.)
You can probably save time by omitting the loop:
$dtbIndex/doc-to-bundle-index-entry [@key = $keysForDocs]/bundles/bundle ! string(.)
Did you check if $dtbIndex is inlined at compile time?
basex-talk@mailman.uni-konstanz.de