I’ve worked out how to optimize my process that indexes DITA topics based on what top-level maps they are ultimately used from (turned out I needed to first index the maps in ref count order from least to most, which meant I could then just look up the top-level maps used by any direct-reference maps that reference a given topic—with that in place each topic only requires a single index lookup).
However, on my laptop these lookups still take about 0.1 second/topic so for 1000s of topics it’s a long time (relatively speaking).
But the topic index process is 100% parallelizable, so I would be able to have at least 2 or 3 ingestion threads going on my 4-CPU server machine.
Note that my ingestion process is two-phased:
Phase 1: Construct an XQuery map with the index details for the input topics (the topics already exist in the database, only the index is new). Phrase 2: Persist the map to the database as XML elements.
I do the map construction in order to both take advantage of map:merge() and because it’s the only way I can do indexing of the DITA maps and topics in one transaction: build the doc-to-root-map for the DITA maps and then use that data to build the doc-to-root-map entries for all the topics, then persist the lot to the database for future use. This is in the context of a one-time mass load of content from a new git work tree. Subsequent changes to the content database will be on individual files and the index can be easily updated incrementally.
So I’m just trying to optimize the startup time so that it doesn’t take two hours to load and index our typical content set.
I can also try to optimize the low-level operations, although they’re pretty simple so I don’t see much opportunity for significant improvement, but I also haven’t had time to try different options and measure them.
I must also say how useful the built-in unit testing framework is—that’s really made this work easier.
Cheers,
Eliot
_____________________________________________ Eliot Kimber Sr Staff Content Engineer O: 512 554 9368 M: 512 554 9368 servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Twitterhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Facebookhttps://www.facebook.com/servicenow
basex-talk@mailman.uni-konstanz.de