In the context of my 40K topic DITA corpus, I’m trying to build a “where used” report that finds, for each topic, the other topics that directly refer to the topic. I can do this by looking for the target topic’s filename in the values of @href attributes in other topics (I’m taking advantage of a local rule we have where all topic filenames should be unique).
My current naive approach is simply:
$topics//*[tokenize(@href, '/') = $filename]
Where $topics is the 40K topics.
Based on profiling, the use of tokenize() is slightly faster than either matches() or contains(), but all forms take about 0.5 seconds per target topic, which is way too slow to make this practical in practice.
So I’m trying to work out what my performance optimization strategies are in BaseX.
In MarkLogic I would set up an index so I could do fast lookup of tokens in @href values or something similar (it’s been long enough since I had to optimize MarkLogic queries that I don’t remember the details but basically indexes for everything).
I know I could do a one-time construction of the where-used table and then use that for quick lookup for subsequent queries but I’m trying to find a solution that is more appropriate for my current “create a new database with the latest files from git and run some queries quickly to get a report” mode.
I suspect that using full-text indexing may be a solution here but wondering what other performance optimization options I have for this kind of look up.
Thinking about it now I definitely need to see if building the where-used table would actually be slower. That is, find every @href, resolve it and construct a map of topics to href elements that point to that topic. Hmm.
Anyway, any guidance on this challenge would be appreciated.
Cheers,
Eliot
_____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Twitterhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Facebookhttps://www.facebook.com/servicenow