SituationI have heap of xml files, containing MSG elements, having attribute id and inner element TGEN, denoting time of MSG creation
<MSG id="abcd">
<TGEN>2011-02-23T15:21:14-01:00<TGEN>
<some-content-here/>
</MSG>
My aim is to select for each id value the latest version (having highest value of TGEN)
For comparing TGEN it is sufficient to treat it as string value, there is no need to cast it to dateTime, which is expensive.
My question:
How to form the query, which will use existing indexes most efficiently.
I am not yet very familiar with reading query plans so I am not able to be sure, it goes the most efficient way.
Background
built in max function converts values into xs:double
declare namespace functx = "http://www.functx.com";
declare function functx:max-string
( $strings as xs:anyAtomicType* ) as xs:string? {
max(for $string in $strings return string($string))
} ;
Sample data file
Sample XML can be created using following xquery
element MJD {
for $id in 1 to 10, $age_in_hours in 24 to 48
let $age := xs:dayTimeDuration(concat('PT', $age_in_hours, 'H'))
let $tgen := xs:dateTime('2011-03-16T12:00:00Z') - $age
return
element MSG {
attribute id {$id},
attribute age {$age},
element TGEN {$tgen}
}
}
Our existing solution
We already did some optimization, biggest speed up gained by using group by
Here is my latest version of xquery
let $latest_msgs :=
(for $msg in //MSG
let $id := $msg/@id
group by $id
return
(
(for $m in $msg
let $tgen := $m//TGEN
order by xs:string($tgen) descending
return $m
)[1])
)
return <MJD count='{count($latest_msgs)}' created='{max($latest_msgs//xs:string(TGEN))}'>{$latest_msgs}</MJD>
Any comments are welcome, especially about efficient use of indexes
Jan