Forwarding to the mailing list in order to share knowledge.

On Fri, Nov 12, 2021 at 1:41 PM BaseX Support <support@basex.org> wrote:

Hi France,

I’d need to get my hands on your code to tell you exactly where it’s
best used, but I can give you some more details on the XQuery
specification:

When creating new nodes in XQuery via node constructors [1], copies of
all enclosed nodes will be created, and the copied nodes get new node
identities. As a result, the following query yields false:

let $a := <a/>
let $b := <b>{ $a }</b>
return $b/a is $a

This step can be very expensive and memory consuming. If the option is
enabled, child nodes will only be linked to the new parent nodes, and
the upper query returns true.

As the option changes the semantics of XQuery, it should preferably be
used in Pragmas.

Best,
Christian

PS: Mails to our mailing list are preferred; this way, other users
might benefit from the replies as well.

[1] https://www.w3.org/TR/xquery-31/#id-constructors

On Fri, Nov 12, 2021 at 2:13 PM France Baril
<france.baril@architextus.com> wrote:
>
> Can you give me more information about how copynode changes the behavior of the xquery and where it is best used.
>
> I see in the example that the pragma is on db:open. My process is:
>
> 1. Read a document A from DB called lang that has references to other documents in the same DB lang (where lang is a 4 letter code for a locale).
> 2. Merge all the references into document A to create an aggregate.
> 3. Send the aggregate through multiple functions (that use copy-modify-return) that each resolve a type of reference (most references grab referenced content from a DB called global, but others grab it from the lang DB). These references do not grad entire documents, but smaller snippets within XML documents.
> 4. Save the result in a DB called staging-lang (where lang is a 4 letter code for a locale).
>
> So should the pragma apply when reading the 1st document (1), when reading the documents we aggregate into the 1st document (2), when grabbing the snippets (3) and/or when saving the end result in the staging DB (4)? Or maybe for all db:open() and db:attribute()/.. functions in this process?
>
>
>
>
>
>
>
> On Fri, Nov 12, 2021 at 12:16 PM BaseX Support <support@basex.org> wrote:
>>
>> One more suggestion:
>>
>> If node construction turns out to consume too much memory, it sometimes helps to disable the COPYNODE option:
>>
>> https://docs.basex.org/wiki/XQuery_Extensions#Database_Pragmas
>>
>>
>>
>> France Baril <france.baril@architextus.com> schrieb am Fr., 12. Nov. 2021, 13:09:
>>>
>>> Hi,
>>>
>>> Thanks for your answer.
>>>
>>> I tried rebuilding the document instead of using copies, I have
>>> implemented 3/4 of the functions that resolve references and I'm
>>> already at double the time I had before. So I will set that one aside
>>> as an unsuccessful alternative. If memory serves me correctly we might
>>> have moved from a transform that rebuilds the document to a
>>> copy-modify-return approach to improve performance over a year ago.
>>>
>>> I will try grouping the references of the same names in the example
>>> above to limit the number of queries to the DB. If that still doesn't
>>> help, I will see if I can send you a good example without having to
>>> send too many of our.
>>>
>>> We have a short term solution where we removed some references in
>>> references, which reduces substantially the number of items to resolve
>>> (80% improvement), but it does impact the user experience, so we are
>>> still looking into code-based solutions as opposed to (or to use in
>>> conjunction with) content-based solutions.
>>>
>>> On Fri, Nov 5, 2021 at 5:22 PM BaseX Support <support@basex.org> wrote:
>>> >
>>> > Hi France,
>>> >
>>> > Do you have some sample data that allows us to test your code?
>>> >
>>> > If documents are pretty large, it’s sometimes faster to rebuild a
>>> > document with node constructors instead of performing updates on it.
>>> >
>>> > Best,
>>> > Christian
>>> > ____________________________________
>>> >
>>> > > We have a query that looks like this:
>>> > >
>>> > > declare function content-refs:resolve-prompt-refs-new($node as node(),
>>> > > $lang as xs:string) as node()*{
>>> > > let $result :=
>>> > > copy $copy := $node
>>> > > modify(
>>> > > let $entries :=
>>> > > $copy/descendant-or-self::*[@name-ref][name()='prompt-ref' or
>>> > > name()='gui-ctrl-ref'
>>> > > or name()='feature-ref' or name()='app-ref' (: or
>>> > > name()='screen-ref':)]
>>> > >
>>> > > let $entries-hd :=
>>> > > $copy/descendant-or-self::*[@id='T1700243243']/descendant-or-self::*[@name-ref][name()='prompt-ref'
>>> > > or name()='gui-ctrl-ref'
>>> > > or name()='feature-ref' or name()='app-ref' (: or
>>> > > name()='screen-ref':)]
>>> > >
>>> > > let $trace := trace('Prompts count: ' || count($entries))
>>> > > let $trace := trace('Prompts in Hardware diagram: ' ||
>>> > > count($entries-hd))
>>> > >
>>> > > for $entry in $entries
>>> > > (:let $trace := trace('start processing entry'):)
>>> > > let $name := $entry/data(@name-ref)
>>> > > let $trace :=
>>> > > if (exists($entry/ancestor::*[@id = 'T1700243243']))
>>> > > then trace( $name , ' Promptref
')
>>> > > else ()
>>> > > let $prompts-from-index := db:attribute('index-prompt-' ||
>>> > > $lang, $name, 'name')/.. (:=> prof:time('index prompt attr: '):)
>>> > > (:let $prompts-from-index := db:open('index-prompt-' ||
>>> > > $lang)//*[@name = $name] => prof:time('index prompt open: '):)
>>> > > let $prompts :=
>>> > > for $prompt in $prompts-from-index
>>> > > let $original-elem-name := $entry/self::*/name()
>>> > > let $new-elem-name :=
>>> > > switch ($original-elem-name)
>>> > > case 'prompt-ref' return $original-elem-name
>>> > > default return substring-before($original-elem-name, '-ref')
>>> > > return
>>> > > copy $prompt-renamed := $prompt
>>> > > modify(
>>> > > rename node $prompt-renamed as $new-elem-name
>>> > > )
>>> > > return $prompt-renamed (:=> prof:time('index prompt new
>>> > > elem-name: '):)
>>> > > let $new-node :=
>>> > > if (count($prompts) = 0)
>>> > > then
>>> > > <filter-group error="{concat("No target found in for: ",
>>> > > $entry/name(), '/@name-ref=', $entry/@name-ref)}"/>
>>> > > else <filter-group-inline>{
>>> > > $prompts
>>> > > }</filter-group-inline>
>>> > > let $trace := ('Ready to replace old entry with new-node')
>>> > > return replace node $entry with $new-node (:=>
>>> > > prof:time('index prompt new node: '):)
>>> > >
>>> > > )
>>> > > return $copy (:=> prof:time('index prompt return copy: '):)
>>> > > return $result
>>> > >
>>> > > };
>>> > >
>>> > > As you can see, we are using prof:time to see how quickly items are
>>> > > resolved. Querying to the db for each item goes fairly quickly (2
>>> > > seconds). However that last 'return $copy' line, after all the
>>> > > replacements are processed takes between 11 and 25 minutes depending
>>> > > on the system. Memory usage is low, but the CPU usage goes to the
>>> > > roof.
>>> > >
>>> > > We are updating a little over 110 000 items in this operation, so it
>>> > > is a big operation on a file of about 89000 indented lines. We are
>>> > > wondering if there is a way we could improve the performance. Before
>>> > > this operation occurs, we are processing the file multiple times to
>>> > > replace other items with very similar functions (copy-modify.return),
>>> > > they all go fairly quickly so it does seem that the culprit is the
>>> > > number of items being replaced.
>>> > >
>>> > >
>>> > > --
>>> > > France Baril
>>> > > Architecte documentaire / Documentation architect
>>> > > france.baril@architextus.com
>>>
>>>
>>>
>>> --
>>> France Baril
>>> Architecte documentaire / Documentation architect
>>> france.baril@architextus.com
>
>
>
> --
> France Baril
> Architecte documentaire / Documentation architect
> france.baril@architextus.com

France Baril
Architecte documentaire / Documentation architect
france.baril@architextus.com