Hi Tom,

 

I think that trying to copy/modify a huge tree is definitely the bottleneck here.

Why don’t you copy only your third Message element and then reconstruct the wrapping Container with ContainerMetaData?

 

Since the wanted result is a transformation, perhaps a typeswitch expression might be an alternative, if there is something that stops you from reconstructing.

 

Daniel

 

Von: Tom Rauchenwald (UNIFITS) <tom.rauchenwald@unifits.com>
Gesendet: Montag, 20. Januar 2020 10:01
An: basex-talk@mailman.uni-konstanz.de
Betreff: [basex-talk] Help with a Query/Performance

 

Hi list,

 

I'm struggling with a query.

 

We have XML documents with a structure similar to this:

 

<Container>

  <ContainerMetaData1>FOO</ContainerMetaData1>

  <ContainerMetaData2>FOO</ContainerMetaData2>

  <MessageA>

    <MessageAMetaData>

      <MessageMetaData1>FOO</MessageMetaData1>

      <MessageMetaData2>FOO</MessageMetaData2>

    </MessageAMetaData>

    <MessageADetail>

      <DetailData1>FOO</DetailData1>

      <DetailData2>FOO</DetailData2>

    </MessageADetail>

    <MessageADetail>

      <DetailData1>FOO</DetailData1>

      <DetailData2>FOO</DetailData2>

    </MessageADetail>

  </MessageA>

  <MessageB>

    <MessageBMetaData>

      <MessageMetaData1>FOO</MessageMetaData1>

      <MessageMetaData2>FOO</MessageMetaData2>

    </MessageBMetaData>

    <MessageBDetail>

      <DetailData1>FOO</DetailData1>

      <DetailData2>FOO</DetailData2>

    </MessageBDetail>

  </MessageB>

  <MessageC>

    <MessageCMetaData>

      <MessageMetaData1>FOO</MessageMetaData1>

      <MessageMetaData2>FOO</MessageMetaData2>

    </MessageCMetaData>

    <MessageCDetail>

      <DetailData1>FOO</DetailData1>

      <DetailData2>FOO</DetailData2>

    </MessageCDetail>

  </MessageC>

</Container>

 

Messages are bundled in a container (up to n times for each message), and each message has details (also up to n times). Container, Message contain data that is the same for all details (it's basically a grouping).

I'd like to retrieve a Detail with all corresponding data associated with it, so basically a MessageADetail, MessageA (without all the other MessageADetails), Container (without all the other Messages).

I know the position of the message (i.e., I know that I want the second MessageA for example), and I know the position of the Detail (i.e., I know that I want the 3rd Detail).

The use case is to show the detail in context in a UI.

 

The query to do this I came up with is (here I want to get the 2nd detail from the third MessageA):

 

  let $fh := (copy $x := /*:Container

   modify ( delete node $x/*:MessageA[position() != 3]

          , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2]

          , delete node $x/*:MessageB

          , delete node $x/*:MessageC          

          )

  return $x)

  return $fh

 

This works well for small documents. For large documents it can take a couple of seconds to evaluate the query (our real-life documents do have more data/elements in Details and Message).

I'm wondering if there's a better/more efficient way to do this. I tried formulating a query that doesn't do deletes, but I couldn't come up with a solution that performs better and is correct.

 

Any pointers would be very much appreciated.

 

Here's a function to generate sufficiently large test data:

 

declare function local:sample($numberOfMessages, $numberOfDetails) {

<Container>

  <ContainerMetaData1>FOO</ContainerMetaData1>

  <ContainerMetaData2>FOO</ContainerMetaData2>

  {for $i in 1 to $numberOfMessages

    return

  <MessageA>

    <MessageAMetaData>

      <MessageMetaData1>FOO {$i}</MessageMetaData1>

      <MessageMetaData2>FOO {$i}</MessageMetaData2>    

    </MessageAMetaData>

    {for $j in 1 to $numberOfDetails

     return

     <MessageADetail>

       <DetailData1>FOO {$j}</DetailData1>

       <DetailData2>FOO {$j}</DetailData2>

     </MessageADetail>

    }

  </MessageA>

  }

  <MessageB>

    <MessageBMetaData>

      <MessageMetaData1>FOO</MessageMetaData1>

      <MessageMetaData2>FOO</MessageMetaData2>    

    </MessageBMetaData>

    <MessageBDetail>

      <DetailData1>FOO</DetailData1>

      <DetailData2>FOO</DetailData2>

    </MessageBDetail>

  </MessageB>

  <MessageC>

    <MessageCMetaData>

      <MessageMetaData1>FOO</MessageMetaData1>

      <MessageMetaData2>FOO</MessageMetaData2>    

    </MessageCMetaData>

    <MessageCDetail>

      <DetailData1>FOO</DetailData1>

      <DetailData2>FOO</DetailData2>

    </MessageCDetail>

  </MessageC>

</Container>

};

 

db:create('tr-test', local:sample(20, 100000), 'test.xml')

 

Thanks, 

Tom Rauchenwald