Hi Tom,
I think that trying to copy/modify a huge tree is definitely the bottleneck here.
Why don’t you copy only your third Message element and then reconstruct the wrapping Container with ContainerMetaData?
Since the wanted result is a transformation, perhaps a typeswitch expression might be an alternative, if there is something that stops you from reconstructing.
Daniel
Von: Tom Rauchenwald (UNIFITS) <tom.rauchenwald@unifits.com>
Gesendet: Montag, 20. Januar 2020 10:01
An: basex-talk@mailman.uni-konstanz.de
Betreff: [basex-talk] Help with a Query/Performance
Hi list,
I'm struggling with a query.
We have XML documents with a structure similar to this:
<Container>
<ContainerMetaData1>FOO</ContainerMetaData1>
<ContainerMetaData2>FOO</ContainerMetaData2>
<MessageA>
<MessageAMetaData>
<MessageMetaData1>FOO</MessageMetaData1>
<MessageMetaData2>FOO</MessageMetaData2>
</MessageAMetaData>
<MessageADetail>
<DetailData1>FOO</DetailData1>
<DetailData2>FOO</DetailData2>
</MessageADetail>
<MessageADetail>
<DetailData1>FOO</DetailData1>
<DetailData2>FOO</DetailData2>
</MessageADetail>
</MessageA>
<MessageB>
<MessageBMetaData>
<MessageMetaData1>FOO</MessageMetaData1>
<MessageMetaData2>FOO</MessageMetaData2>
</MessageBMetaData>
<MessageBDetail>
<DetailData1>FOO</DetailData1>
<DetailData2>FOO</DetailData2>
</MessageBDetail>
</MessageB>
<MessageC>
<MessageCMetaData>
<MessageMetaData1>FOO</MessageMetaData1>
<MessageMetaData2>FOO</MessageMetaData2>
</MessageCMetaData>
<MessageCDetail>
<DetailData1>FOO</DetailData1>
<DetailData2>FOO</DetailData2>
</MessageCDetail>
</MessageC>
</Container>
Messages are bundled in a container (up to n times for each message), and each message has details (also up to n times). Container, Message contain data that is the same for all details (it's basically
a grouping).
I'd like to retrieve a Detail with all corresponding data associated with it, so basically a MessageADetail, MessageA (without all the other MessageADetails), Container (without all the other Messages).
I know the position of the message (i.e., I know that I want the second MessageA for example), and I know the position of the Detail (i.e., I know that I want the 3rd Detail).
The use case is to show the detail in context in a UI.
The query to do this I came up with is (here I want to get the 2nd detail from the third MessageA):
let $fh := (copy $x := /*:Container
modify ( delete node $x/*:MessageA[position() != 3]
, delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2]
, delete node $x/*:MessageB
, delete node $x/*:MessageC
)
return $x)
return $fh
This works well for small documents. For large documents it can take a couple of seconds to evaluate the query (our real-life documents do have more data/elements in Details and Message).
I'm wondering if there's a better/more efficient way to do this. I tried formulating a query that doesn't do deletes, but I couldn't come up with a solution that performs better and is correct.
Any pointers would be very much appreciated.
Here's a function to generate sufficiently large test data:
declare function local:sample($numberOfMessages, $numberOfDetails) {
<Container>
<ContainerMetaData1>FOO</ContainerMetaData1>
<ContainerMetaData2>FOO</ContainerMetaData2>
{for $i in 1 to $numberOfMessages
return
<MessageA>
<MessageAMetaData>
<MessageMetaData1>FOO {$i}</MessageMetaData1>
<MessageMetaData2>FOO {$i}</MessageMetaData2>
</MessageAMetaData>
{for $j in 1 to $numberOfDetails
return
<MessageADetail>
<DetailData1>FOO {$j}</DetailData1>
<DetailData2>FOO {$j}</DetailData2>
</MessageADetail>
}
</MessageA>
}
<MessageB>
<MessageBMetaData>
<MessageMetaData1>FOO</MessageMetaData1>
<MessageMetaData2>FOO</MessageMetaData2>
</MessageBMetaData>
<MessageBDetail>
<DetailData1>FOO</DetailData1>
<DetailData2>FOO</DetailData2>
</MessageBDetail>
</MessageB>
<MessageC>
<MessageCMetaData>
<MessageMetaData1>FOO</MessageMetaData1>
<MessageMetaData2>FOO</MessageMetaData2>
</MessageCMetaData>
<MessageCDetail>
<DetailData1>FOO</DetailData1>
<DetailData2>FOO</DetailData2>
</MessageCDetail>
</MessageC>
</Container>
};
db:create('tr-test', local:sample(20, 100000), 'test.xml')
Thanks,
Tom Rauchenwald