Hi Christian,

thanks for the help!

I have a working version of this now that performs well.
Initially I didn't want to reconstruct parts of the message, because we have a couple of different versions of these containers and usually there are multiple namespaces and prefixes involved that should be preserved. But it turns out this was easier than I thought.

Thanks & greetings from Salzburg,
Tom



Von: Christian Grün <christian.gruen@gmail.com>
Gesendet: Montag, 20. Jänner 2020 19:06
An: Tom Rauchenwald (UNIFITS) <tom.rauchenwald@unifits.com>
Cc: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Betreff: Re: [basex-talk] Help with a Query/Performance
 
I missed to do the obvious next step. The following query is evaluated
in a few milliseconds:

  declare variable $OFFSET1 := 3;
  declare variable $OFFSET2 := 2;

  let $container := db:open('tr-test')/Container
  let $message := $container/*:MessageA[$OFFSET1]
  let $detail := $message/MessageADetail[$OFFSET2]
  return element { name($container) } {
    $container/*[contains(name(), 'MetaData')],
    element { name($message) } {
      $message/MessageAMetaData,
      element { name($detail) } {
        $detail/*
      }
    }
  }


On Mon, Jan 20, 2020 at 6:54 PM Christian Grün
<christian.gruen@gmail.com> wrote:
>
> Dear Tom,
>
> If you have large elements, it will usually be faster to create new
> elements. Here’s one way to do it:
>
>   let $offset1 := 3
>   let $offset2 := 2
>   let $container := db:open('tr-test')/Container
>   return element Container {
>     (: add meta data elements :)
>     $container/*[starts-with(name(), 'ContainerMetaData')],
>     (: alternative: add everything except Message elements
>     $container/(* except (MessageA, MessageB, MessageC)), :)
>     $container/MessageA[$offset1] update {
>       delete node MessageADetail[position() != $offset2]
>     }
>   }
>
> There are probably ways to get this even faster; I may have a look at
> this tomorrow.
>
> All the best from Konstanz,
> Christian
>
>
>
> On Mon, Jan 20, 2020 at 10:01 AM Tom Rauchenwald (UNIFITS)
> <tom.rauchenwald@unifits.com> wrote:
> >
> > Hi list,
> >
> > I'm struggling with a query.
> >
> > We have XML documents with a structure similar to this:
> >
> > <Container>
> >   <ContainerMetaData1>FOO</ContainerMetaData1>
> >   <ContainerMetaData2>FOO</ContainerMetaData2>
> >   <MessageA>
> >     <MessageAMetaData>
> >       <MessageMetaData1>FOO</MessageMetaData1>
> >       <MessageMetaData2>FOO</MessageMetaData2>
> >     </MessageAMetaData>
> >     <MessageADetail>
> >       <DetailData1>FOO</DetailData1>
> >       <DetailData2>FOO</DetailData2>
> >     </MessageADetail>
> >     <MessageADetail>
> >       <DetailData1>FOO</DetailData1>
> >       <DetailData2>FOO</DetailData2>
> >     </MessageADetail>
> >   </MessageA>
> >   <MessageB>
> >     <MessageBMetaData>
> >       <MessageMetaData1>FOO</MessageMetaData1>
> >       <MessageMetaData2>FOO</MessageMetaData2>
> >     </MessageBMetaData>
> >     <MessageBDetail>
> >       <DetailData1>FOO</DetailData1>
> >       <DetailData2>FOO</DetailData2>
> >     </MessageBDetail>
> >   </MessageB>
> >   <MessageC>
> >     <MessageCMetaData>
> >       <MessageMetaData1>FOO</MessageMetaData1>
> >       <MessageMetaData2>FOO</MessageMetaData2>
> >     </MessageCMetaData>
> >     <MessageCDetail>
> >       <DetailData1>FOO</DetailData1>
> >       <DetailData2>FOO</DetailData2>
> >     </MessageCDetail>
> >   </MessageC>
> > </Container>
> >
> > Messages are bundled in a container (up to n times for each message), and each message has details (also up to n times). Container, Message contain data that is the same for all details (it's basically a grouping).
> > I'd like to retrieve a Detail with all corresponding data associated with it, so basically a MessageADetail, MessageA (without all the other MessageADetails), Container (without all the other Messages).
> > I know the position of the message (i.e., I know that I want the second MessageA for example), and I know the position of the Detail (i.e., I know that I want the 3rd Detail).
> > The use case is to show the detail in context in a UI.
> >
> > The query to do this I came up with is (here I want to get the 2nd detail from the third MessageA):
> >
> >   let $fh := (copy $x := /*:Container
> >    modify ( delete node $x/*:MessageA[position() != 3]
> >           , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2]
> >           , delete node $x/*:MessageB
> >           , delete node $x/*:MessageC
> >           )
> >   return $x)
> >   return $fh
> >
> > This works well for small documents. For large documents it can take a couple of seconds to evaluate the query (our real-life documents do have more data/elements in Details and Message).
> > I'm wondering if there's a better/more efficient way to do this. I tried formulating a query that doesn't do deletes, but I couldn't come up with a solution that performs better and is correct.
> >
> > Any pointers would be very much appreciated.
> >
> > Here's a function to generate sufficiently large test data:
> >
> > declare function local:sample($numberOfMessages, $numberOfDetails) {
> > <Container>
> >   <ContainerMetaData1>FOO</ContainerMetaData1>
> >   <ContainerMetaData2>FOO</ContainerMetaData2>
> >   {for $i in 1 to $numberOfMessages
> >     return
> >   <MessageA>
> >     <MessageAMetaData>
> >       <MessageMetaData1>FOO {$i}</MessageMetaData1>
> >       <MessageMetaData2>FOO {$i}</MessageMetaData2>
> >     </MessageAMetaData>
> >     {for $j in 1 to $numberOfDetails
> >      return
> >      <MessageADetail>
> >        <DetailData1>FOO {$j}</DetailData1>
> >        <DetailData2>FOO {$j}</DetailData2>
> >      </MessageADetail>
> >     }
> >   </MessageA>
> >   }
> >   <MessageB>
> >     <MessageBMetaData>
> >       <MessageMetaData1>FOO</MessageMetaData1>
> >       <MessageMetaData2>FOO</MessageMetaData2>
> >     </MessageBMetaData>
> >     <MessageBDetail>
> >       <DetailData1>FOO</DetailData1>
> >       <DetailData2>FOO</DetailData2>
> >     </MessageBDetail>
> >   </MessageB>
> >   <MessageC>
> >     <MessageCMetaData>
> >       <MessageMetaData1>FOO</MessageMetaData1>
> >       <MessageMetaData2>FOO</MessageMetaData2>
> >     </MessageCMetaData>
> >     <MessageCDetail>
> >       <DetailData1>FOO</DetailData1>
> >       <DetailData2>FOO</DetailData2>
> >     </MessageCDetail>
> >   </MessageC>
> > </Container>
> > };
> >
> > db:create('tr-test', local:sample(20, 100000), 'test.xml')
> >
> > Thanks,
> > Tom Rauchenwald
> >
> >