I'm beginning to think that perhaps my performance hopes were a bit too inflated, given the size and complexity of our database. After a fresh optimization, and with -Xms2g -Xmx10g, the following query takes 1492ms:
declare namespace ernm="http://ddex.net/xml/ern/411"; for $r in /ernm:NewReleaseMessage for $track_release in $r/ReleaseList/TrackRelease return <identity> <isrc>{ $track_release/ReleaseId/ISRC/text() }</isrc> </identity>
When I add a little bit more to the query, we go up to 11204ms:
declare namespace ernm="http://ddex.net/xml/ern/411"; for $r in /ernm:NewReleaseMessage let $parties := $r/PartyList/Party for $track_release in $r/ReleaseList/TrackRelease let $rlr := $track_release/ReleaseLabelReference/text() let $party := $parties[PartyReference/text() = $rlr] return <identity> <isrc>{ $track_release/ReleaseId/ISRC/text() }</isrc> <sublabel>{ $party/PartyName/FullName/text() }</sublabel> </identity>
The fastest full query that I've been able to make so far takes 70450ms:
declare namespace ernm="http://ddex.net/xml/ern/411"; for $r in /ernm:NewReleaseMessage let $parties := $r/PartyList/Party let $sound_recordings := $r/ResourceList/SoundRecording let $releases := $r/ReleaseList/Release for $track_release in $r/ReleaseList/TrackRelease let $rrr := $track_release/ReleaseResourceReference/text() let $rlr := $track_release/ReleaseLabelReference/text() let $sound_recording := $sound_recordings[ResourceReference/text() = $rrr] let $release := $releases[ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseResourceReference/text() = $rrr] let $party := $parties[PartyReference/text() = $rlr] return <identity> <isrc>{ $track_release/ReleaseId/ISRC/text() }</isrc> <artist>{ fn:string-join($sound_recording/DisplayArtistName/text(), '/') }</artist> <title>{ $sound_recording/DisplayTitleText/text() }</title> <album>{ $release/DisplayTitleText/text() }</album> <icpn>{ $release/ReleaseId/ICPN/text() }</icpn> <sublabel>{ $party/PartyName/FullName/text() }</sublabel> </identity>
Still, a significant improvement over where I started! Running this query against the actual remote server, and retrieving the full result set, is down to 3:52, from ~15 minutes. As this is a batch process that will run every hour, that performance is adequate. It does seem to be heavily CPU bound: when running any of these queries, there's always a single core sitting at 100% utilization while the rest of them are idle.
Thanks for the help! Bill