Case example:
A publication is defined by a tree structure that references a bunch of other files that also reference a bunch of other files. In order to create an aggregate to transform with fo and create a PDF, we need to open all files and merge them. In the merge we also query a lot of small variables stored in different files (may a dozen per file referenced by the main tree). For example if I look for an official variable value for a product code in a specific language, I go for: db:open('resources')/*[id='model-definitions']/descendant::*[@id=$desired-model]/*[@xml:lang='zw-th']/node().
If I could do db:node-id('resources', 'model-definition#' || $desired-model)/*[@xml:lang='zw-th']/node() and leverage the fact that this info is indexed natively, I do believe that it would be faster. I am currently working hard on performance. It used to take 7 minutes to aggregate our longest publication (for one lang so multiply by 55 for all languages for all) and now it takes a bit less than 2 minutes. I'm aiming for 30 sec or less so yes, a few hundreds faster db and node access by id have an impact.
P.S. I still have not built custom indices... you may get questions about that in future emails.
On Thu, Nov 14, 2019 at 4:56 PM Christian Grün christian.gruen@gmail.com wrote:
I see. In that case, we may need to think about building a custom index structure for storing the XML IDs (the node ids and pre values are part of the existing table storage).
Did you already have performance issues with db:open($db-name)/*[@id=$doc-id] ?
On Thu, Nov 14, 2019 at 4:33 PM France Baril france.baril@architextus.com wrote:
It's not about the number of lines. I was thinking that open-id would be
more performant than db:open + root id match. I can create my custom index for root ids but since there is always a mechanism in place that handle ids I thought it could be useful to avoid duplicating features that already exist. I was going to see if I could use you ids in my docs instead by that would create issues for any export/import where ids might change in BaseX plus, the integer vs xml id format is blocking.