Hi, any chance we could make the db indexed ids match the xml doc ids?
- db:open-id() could match the id at the root of then documents (when one is provided) - db:node-id() could match other nodes match internal ids (when they are provided). It would make indexing and querying a lot more efficient for any type of navigation/linking purposes.
Options:
- I'd suggest that this could be an option that can be turned on/off for each doc level ids and node level ids - I'd suggest that node id has the option to be unique within the db or within a document when the value for the entire db would then have to become something like doc-id#node-id to stay unique within the db
Then I can easily imagine also expending to db:save-id to reduce the workload of having to pass through base-uri() once the doc exists in the db under its path
let $db-name := 'en-us' let $doc := db:open-id($db-name, $doc-id) let $new-doc := copy, modify, return copy db:replace-id('en-us', $doc-id, $new-doc)
instead of
let $db-name := 'en-us' let $doc := db:open($db-name)/*[@id=$doc-id] let $new-doc := copy, modify, return copy let $path := substring-after($doc/base-uri(), $db-name) return db:replace($db-name, $path, $new-doc)
Is that something that can be considered?
Hi France,
If you believe that the two functions…
db:open-id($db-name, $doc-id) db:open($db-name)/*[@id=$doc-id]
…should be more or less equivalent, I would tend to stick with the current solution (overriding existing functions with new functionality is often prone to errors). What you can always do is to work with a utility module, which you copy into your BaseX repo directory, and which contains all the functions that you frequently use. Have you considered this option?
Best, Christian
On Thu, Nov 14, 2019 at 3:54 PM France Baril france.baril@architextus.com wrote:
Hi, any chance we could make the db indexed ids match the xml doc ids?
db:open-id() could match the id at the root of then documents (when one is provided) db:node-id() could match other nodes match internal ids (when they are provided). It would make indexing and querying a lot more efficient for any type of navigation/linking purposes.
Options:
I'd suggest that this could be an option that can be turned on/off for each doc level ids and node level ids I'd suggest that node id has the option to be unique within the db or within a document when the value for the entire db would then have to become something like doc-id#node-id to stay unique within the db
Then I can easily imagine also expending to db:save-id to reduce the workload of having to pass through base-uri() once the doc exists in the db under its path
let $db-name := 'en-us' let $doc := db:open-id($db-name, $doc-id) let $new-doc := copy, modify, return copy db:replace-id('en-us', $doc-id, $new-doc)
instead of
let $db-name := 'en-us' let $doc := db:open($db-name)/*[@id=$doc-id] let $new-doc := copy, modify, return copy let $path := substring-after($doc/base-uri(), $db-name) return db:replace($db-name, $path, $new-doc)
Is that something that can be considered?
It's not about the number of lines. I was thinking that open-id would be more performant than db:open + root id match. I can create my custom index for root ids but since there is always a mechanism in place that handle ids I thought it could be useful to avoid duplicating features that already exist. I was going to see if I could use you ids in my docs instead by that would create issues for any export/import where ids might change in BaseX plus, the integer vs xml id format is blocking.
I see. In that case, we may need to think about building a custom index structure for storing the XML IDs (the node ids and pre values are part of the existing table storage).
Did you already have performance issues with db:open($db-name)/*[@id=$doc-id] ?
On Thu, Nov 14, 2019 at 4:33 PM France Baril france.baril@architextus.com wrote:
It's not about the number of lines. I was thinking that open-id would be more performant than db:open + root id match. I can create my custom index for root ids but since there is always a mechanism in place that handle ids I thought it could be useful to avoid duplicating features that already exist. I was going to see if I could use you ids in my docs instead by that would create issues for any export/import where ids might change in BaseX plus, the integer vs xml id format is blocking.
Case example:
A publication is defined by a tree structure that references a bunch of other files that also reference a bunch of other files. In order to create an aggregate to transform with fo and create a PDF, we need to open all files and merge them. In the merge we also query a lot of small variables stored in different files (may a dozen per file referenced by the main tree). For example if I look for an official variable value for a product code in a specific language, I go for: db:open('resources')/*[id='model-definitions']/descendant::*[@id=$desired-model]/*[@xml:lang='zw-th']/node().
If I could do db:node-id('resources', 'model-definition#' || $desired-model)/*[@xml:lang='zw-th']/node() and leverage the fact that this info is indexed natively, I do believe that it would be faster. I am currently working hard on performance. It used to take 7 minutes to aggregate our longest publication (for one lang so multiply by 55 for all languages for all) and now it takes a bit less than 2 minutes. I'm aiming for 30 sec or less so yes, a few hundreds faster db and node access by id have an impact.
P.S. I still have not built custom indices... you may get questions about that in future emails.
On Thu, Nov 14, 2019 at 4:56 PM Christian Grün christian.gruen@gmail.com wrote:
I see. In that case, we may need to think about building a custom index structure for storing the XML IDs (the node ids and pre values are part of the existing table storage).
Did you already have performance issues with db:open($db-name)/*[@id=$doc-id] ?
On Thu, Nov 14, 2019 at 4:33 PM France Baril france.baril@architextus.com wrote:
It's not about the number of lines. I was thinking that open-id would be
more performant than db:open + root id match. I can create my custom index for root ids but since there is always a mechanism in place that handle ids I thought it could be useful to avoid duplicating features that already exist. I was going to see if I could use you ids in my docs instead by that would create issues for any export/import where ids might change in BaseX plus, the integer vs xml id format is blocking.
basex-talk@mailman.uni-konstanz.de