Re: [basex-talk] Performance issue on copy-modify-return when too many items are being replaced

12 Nov 2021

      Forwarding to the mailing list in order to share knowledge.
On Fri, Nov 12, 2021 at 1:41 PM BaseX Support support@basex.org wrote:
...
Hi France,
I’d need to get my hands on your code to tell you exactly where it’s
best used, but I can give you some more details on the XQuery
specification:
When creating new nodes in XQuery via node constructors [1], copies of
all enclosed nodes will be created, and the copied nodes get new node
identities. As a result, the following query yields false:
let $a := <a/>
let $b := <b>{ $a }</b>
return $b/a is $a
This step can be very expensive and memory consuming. If the option is
enabled, child nodes will only be linked to the new parent nodes, and
the upper query returns true.
As the option changes the semantics of XQuery, it should preferably be
used in Pragmas.
Best,
Christian
PS: Mails to our mailing list are preferred; this way, other users
might benefit from the replies as well.
[1] https://www.w3.org/TR/xquery-31/#id-constructors
On Fri, Nov 12, 2021 at 2:13 PM France Baril
france.baril@architextus.com wrote:
...
Can you give me more information about how copynode changes the behavior
of the xquery and where it is best used.
...
I see in the example that the pragma is on db:open. My process is:

Read a document A from DB called lang that has references to other

documents in the same DB lang (where lang is a 4 letter code for a locale).
...

Merge all the references into document A to create an aggregate.
Send the aggregate through multiple functions (that use

copy-modify-return) that each resolve a type of reference (most references
grab referenced content from a DB called global, but others grab it from
the lang DB). These references do not grad entire documents, but smaller
snippets within XML documents.
...

Save the result in a DB called staging-lang (where lang is a 4 letter

code for a locale).
...
So should the pragma apply when reading the 1st document (1), when
reading the documents we aggregate into the 1st document (2), when grabbing
the snippets (3) and/or when saving the end result in the staging DB (4)?
Or maybe for all db:open() and db:attribute()/.. functions in this process?
...
On Fri, Nov 12, 2021 at 12:16 PM BaseX Support support@basex.org
wrote:
...
...
One more suggestion:
If node construction turns out to consume too much memory, it sometimes
helps to disable the COPYNODE option:
...
...
https://docs.basex.org/wiki/XQuery_Extensions#Database_Pragmas
France Baril france.baril@architextus.com schrieb am Fr., 12. Nov.
2021, 13:09:
...
...
...
Hi,
Thanks for your answer.
I tried rebuilding the document instead of using copies, I have
implemented 3/4 of the functions that resolve references and I'm
already at double the time I had before. So I will set that one aside
as an unsuccessful alternative. If memory serves me correctly we might
have moved from a transform that rebuilds the document to a
copy-modify-return approach to improve performance over a year ago.
I will try grouping the references of the same names in the example
above to limit the number of queries to the DB. If that still doesn't
help, I will see if I can send you a good example without having to
send too many of our.
We have a short term solution where we removed some references in
references, which reduces substantially the number of items to resolve
(80% improvement), but it does impact the user experience, so we are
still looking into code-based solutions as opposed to (or to use in
conjunction with) content-based solutions.
On Fri, Nov 5, 2021 at 5:22 PM BaseX Support support@basex.org
wrote:
...
...
...
...
Hi France,
Do you have some sample data that allows us to test your code?
If documents are pretty large, it’s sometimes faster to rebuild a
document with node constructors instead of performing updates on it.
Best,
Christian
____________________________________
...
We have a query that looks like this:
declare function content-refs:resolve-prompt-refs-new($node as
node(),
...
...
...
...
...
$lang as xs:string) as node()*{
   let $result :=
     copy $copy := $node
     modify(
          let $entries :=
$copy/descendant-or-self::*[@name-ref][name()='prompt-ref' or
name()='gui-ctrl-ref'
            or name()='feature-ref' or name()='app-ref' (: or
name()='screen-ref':)]
      let $entries-hd :=

$copy/descendant-or-self::*[@id='T1700243243']/descendant-or-self::*[@name-ref][name()='prompt-ref'
...
...
...
...
...
or name()='gui-ctrl-ref'
            or name()='feature-ref' or name()='app-ref' (: or
name()='screen-ref':)]
      let $trace := trace('Prompts count: ' || count($entries))
      let $trace := trace('Prompts in Hardware diagram: ' ||

count($entries-hd))
      for $entry in $entries
      (:let $trace := trace('start processing entry'):)
      let $name := $entry/data(@name-ref)
      let $trace :=
        if (exists($entry/ancestor::*[@id = 'T1700243243']))
        then trace( $name , ' Promptref&#10;')
        else ()
      let $prompts-from-index := db:attribute('index-prompt-'

||
...
...
...
...
...
$lang, $name, 'name')/.. (:=> prof:time('index prompt attr: '):)
          (:let $prompts-from-index := db:open('index-prompt-' ||
$lang)//*[@name = $name] => prof:time('index prompt open: '):)
          let $prompts :=
            for $prompt in $prompts-from-index
            let $original-elem-name := $entry/self::*/name()
            let $new-elem-name :=
               switch ($original-elem-name)
               case 'prompt-ref' return $original-elem-name
               default return
substring-before($original-elem-name, '-ref')
...
...
...
...
...
        return
           copy $prompt-renamed := $prompt
           modify(
              rename node $prompt-renamed as $new-elem-name
           )
           return $prompt-renamed (:=> prof:time('index prompt

new
...
...
...
...
...
elem-name: '):)
          let $new-node :=
            if (count($prompts) = 0)
            then
               <filter-group error="{concat("No target found in
for:&#32;",
...
...
...
...
...
              $entry/name(), '/@name-ref=',

$entry/@name-ref)}"/>
...
...
...
...
...
        else <filter-group-inline>{
           $prompts
         }</filter-group-inline>
      let $trace := ('Ready to replace old entry with

new-node')
...
...
...
...
...
      return replace node $entry with $new-node (:=>

prof:time('index prompt new node: '):)
 )
 return $copy  (:=> prof:time('index prompt return copy: '):)

return $result
};
As you can see, we are using prof:time to see how quickly items are
resolved. Querying to the db for each item goes fairly quickly (2
seconds). However that last 'return $copy' line, after all the
replacements are processed takes between 11 and 25 minutes
depending
...
...
...
...
...
on the system. Memory usage is low, but the CPU usage goes to the
roof.
We are updating a little over 110 000 items in this operation, so
it
...
...
...
...
...
is a big operation on a file of about 89000 indented lines. We are
wondering if there is a way we could improve the performance.
Before
...
...
...
...
...
this operation occurs, we are processing the file multiple times to
replace other items with very similar functions
(copy-modify.return),
...
...
...
...
...
they all go fairly quickly so it does seem that the culprit is the
number of items being replaced.
--
France Baril
Architecte documentaire / Documentation architect
france.baril@architextus.com
--
France Baril
Architecte documentaire / Documentation architect
france.baril@architextus.com
--
France Baril
Architecte documentaire / Documentation architect
france.baril@architextus.com
-- 
France Baril
Architecte documentaire / Documentation architect
france.baril@architextus.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Performance issue on copy-modify-return when too many items are being replaced