Hi Eliot,
in similar cases, I've learned that building temporary maps is really fast.
So, instead of doing the retrieval and filtering in one step, I just construct a map with a convenient key.
In the example, I want a list of categories for articles that could exist in multiple sections (of a web site).
In a later step, I will just consult the map for the categories.
let $category-map := map:merge( for $a in $all-sections//ProductItem let $guid := $a/@Guid group by $guid return map:entry($guid, <categories>{ let $cats := for $s in $a/parent::*/parent::Section return $s/ShopCategoryId/text() for $cat in distinct-values($cats) return <_><id>{$cat}</id></_> }</categories> ) )
Best, Max
Am Fr., 14. Jan. 2022 um 16:41 Uhr schrieb Eliot Kimber eliot.kimber@servicenow.com:
In the context of my 40K topic DITA corpus, I’m trying to build a “where used” report that finds, for each topic, the other topics that directly refer to the topic. I can do this by looking for the target topic’s filename in the values of @href attributes in other topics (I’m taking advantage of a local rule we have where all topic filenames should be unique).
My current naive approach is simply:
$topics//*[tokenize(@href, '/') = $filename]
Where $topics is the 40K topics.
Based on profiling, the use of tokenize() is slightly faster than either matches() or contains(), but all forms take about 0.5 seconds per target topic, which is way too slow to make this practical in practice.
So I’m trying to work out what my performance optimization strategies are in BaseX.
In MarkLogic I would set up an index so I could do fast lookup of tokens in @href values or something similar (it’s been long enough since I had to optimize MarkLogic queries that I don’t remember the details but basically indexes for everything).
I know I could do a one-time construction of the where-used table and then use that for quick lookup for subsequent queries but I’m trying to find a solution that is more appropriate for my current “create a new database with the latest files from git and run some queries quickly to get a report” mode.
I suspect that using full-text indexing may be a solution here but wondering what other performance optimization options I have for this kind of look up.
Thinking about it now I definitely need to see if building the where-used table would actually be slower. That is, find every @href, resolve it and construct a map of topics to href elements that point to that topic. Hmm.
Anyway, any guidance on this challenge would be appreciated.
Cheers,
Eliot
Eliot Kimber
Sr Staff Content Engineer
O: 512 554 9368
M: 512 554 9368
servicenow.com
LinkedIn | Twitter | YouTube | Facebook