Thank you both! I was in the middle of typing a response to Bridger when this came through, so while I digest both of your messages & test index addition/moving things around, I'll leave the output from the GUI info panel here - if I knew how to read it, it would no doubt point straight at the issue! I'll note that it runs much quicker locally as opposed to on server+via the python BaseXClient, but I don't know if that's because the GUI is stopping at 500k results, or what.
Compiling: - rewrite context value to document-node() sequence: . -> (db:open-pre("umg-118061851424", 0), ...) - rewrite util:root(nodes) to document-node() sequence: util:root((db:open-pre("umg-118061851424", 0), ...)) -> (db:open-pre("umg-118061851424", 0), ...) - move where clause: $track_release_1/ReleaseLabelReference = $party_2/PartyReference - swap operands: PartyReference = $track_release_1/ReleaseLabelReference - rewrite to predicate: PartyReference = $track_release_1/ReleaseLabelReference - move where clause: $track_release_1/ReleaseResourceReference = $sound_recording_3/ResourceReference - swap operands: ResourceReference = $track_release_1/ReleaseResourceReference - rewrite to predicate: ResourceReference = $track_release_1/ReleaseResourceReference - swap operands: ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseResourceReference = $track_release_1/Rel... - rewrite to predicate: ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseResourceReference = $track_release_1/Rel... - inline for $release_4 in $r_0/ReleaseList/Release[ResourceGroup/ResourceGroup/ResourceGroupContentItem/Rele...
Optimized Query: for $r_0 in (db:open-pre("umg-118061851424", 0), ...)/ernm:NewReleaseMessage for $track_release_1 in $r_0/ReleaseList/TrackRelease for $party_2 in $r_0/PartyList/Party[PartyReference = $track_release_1/ReleaseLabelReference] for $sound_recording_3 in $r_0/ResourceList/SoundRecording[ResourceReference = $track_release_1/ReleaseResourceReference] return $r_0/ReleaseList/Release[ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseResourceReference = $track_release_1/ReleaseResourceReference] ! <identity> <isrc> { $track_release_1/ReleaseId/ISRC/text() } </isrc> <artist> { string-join($sound_recording_3/DisplayArtistName, "/") } </artist> <title> { $sound_recording_3/DisplayTitleText/text() } </title> <album> { DisplayTitleText/text() } </album> <icpn> { ReleaseId/ICPN/text() } </icpn> <sublabel> { $party_2/PartyName/FullName/text() } </sublabel> </identity>
Query: declare namespace ernm="http://ddex.net/xml/ern/411"; for $r in /ernm:NewReleaseMessage for $track_release in $r/ReleaseList/TrackRelease for $party in $r/PartyList/Party for $sound_recording in $r/ResourceList/SoundRecording for $release in $r/ReleaseList/Release where $track_release/ReleaseLabelReference = $party/PartyReference and $track_release/ReleaseResourceReference = $sound_recording/ResourceReference and $track_release/ReleaseResourceReference = $release/ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseResourceReference return <identity> <isrc>{ $track_release/ReleaseId/ISRC/text() }</isrc> <artist>{ fn:string-join($sound_recording/DisplayArtistName, '/') }</artist> <title>{ $sound_recording/DisplayTitleText/text() }</title> <album>{ $release/DisplayTitleText/text() }</album> <icpn>{ $release/ReleaseId/ICPN/text() }</icpn> <sublabel>{ $party/PartyName/FullName/text() }</sublabel> </identity>
Result: - Hit(s): 500000 Items - Updated: 0 Items - Printed: 8192 kB - Read Locking: umg-118061851424 - Write Locking: (none)
Timing: - Parsing: 2.88 ms - Compiling: 4.38 ms - Evaluating: 114935.22 ms - Printing: 114.59 ms - Total Time: 115057.07 ms
Query Plan: <QueryPlan compiled="true" updating="false"> <GFLWOR type="element()*"> <For type="element()" size="1" name="$r" id="0"> <IterPath type="element()*" database="umg-118061851424"> <DBNodeSeq type="document-node()+" size="136938" database="umg-118061851424"> <DBNode pre="0" type="document-node()" size="1" database="umg-118061851424"/> <DBNode pre="59" type="document-node()" size="1" database="umg-118061851424"/> <DBNode pre="118" type="document-node()" size="1" database="umg-118061851424"/> <DBNode pre="177" type="document-node()" size="1" database="umg-118061851424"/> <DBNode pre="236" type="document-node()" size="1" database="umg-118061851424"/> </DBNodeSeq> <IterStep axis="child" test="ernm:NewReleaseMessage" type="element()*"/> </IterPath> </For> <For type="element()" size="1" name="$track_release" id="1"> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$r" id="0"/> <IterStep axis="child" test="ReleaseList" type="element()*"/> <IterStep axis="child" test="TrackRelease" type="element()*"/> </IterPath> </For> <For type="element()" size="1" name="$party" id="2"> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$r" id="0"/> <IterStep axis="child" test="PartyList" type="element()*"/> <IterStep axis="child" test="Party" type="element()*"> <CmpHashG op="=" type="xs:boolean" size="1"> <SingleIterPath type="element()*" database="umg-118061851424"> <IterStep axis="child" test="PartyReference" type="element()*"/> </SingleIterPath> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$track_release" id="1"/> <IterStep axis="child" test="ReleaseLabelReference" type="element()*"/> </IterPath> </CmpHashG> </IterStep> </IterPath> </For> <For type="element()" size="1" name="$sound_recording" id="3"> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$r" id="0"/> <IterStep axis="child" test="ResourceList" type="element()*"/> <IterStep axis="child" test="SoundRecording" type="element()*"> <CmpHashG op="=" type="xs:boolean" size="1"> <SingleIterPath type="element()*" database="umg-118061851424"> <IterStep axis="child" test="ResourceReference" type="element()*"/> </SingleIterPath> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$track_release" id="1"/> <IterStep axis="child" test="ReleaseResourceReference" type="element()*"/> </IterPath> </CmpHashG> </IterStep> </IterPath> </For> <DualMap type="element()*"> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$r" id="0"/> <IterStep axis="child" test="ReleaseList" type="element()*"/> <IterStep axis="child" test="Release" type="element()*"> <CmpHashG op="=" type="xs:boolean" size="1"> <IterPath type="element()*" database="umg-118061851424"> <IterStep axis="child" test="ResourceGroup" type="element()*"/> <IterStep axis="child" test="ResourceGroup" type="element()*"/> <IterStep axis="child" test="ResourceGroupContentItem" type="element()*"/> <IterStep axis="child" test="ReleaseResourceReference" type="element()*"/> </IterPath> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$track_release" id="1"/> <IterStep axis="child" test="ReleaseResourceReference" type="element()*"/> </IterPath> </CmpHashG> </IterStep> </IterPath> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">identity</QNm> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">isrc</QNm> <IterPath type="text()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$track_release" id="1"/> <IterStep axis="child" test="ReleaseId" type="element()*"/> <IterStep axis="child" test="ISRC" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </IterPath> </CElem> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">artist</QNm> <FnStringJoin name="string-join" type="xs:string" size="1"> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$sound_recording" id="3"/> <IterStep axis="child" test="DisplayArtistName" type="element()*"/> </IterPath> <Str type="xs:string" size="1">/</Str> </FnStringJoin> </CElem> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">title</QNm> <IterPath type="text()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$sound_recording" id="3"/> <IterStep axis="child" test="DisplayTitleText" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </IterPath> </CElem> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">album</QNm> <IterPath type="text()*" database="umg-118061851424"> <IterStep axis="child" test="DisplayTitleText" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </IterPath> </CElem> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">icpn</QNm> <IterPath type="text()*" database="umg-118061851424"> <IterStep axis="child" test="ReleaseId" type="element()*"/> <IterStep axis="child" test="ICPN" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </IterPath> </CElem> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">sublabel</QNm> <IterPath type="text()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$party" id="2"/> <IterStep axis="child" test="PartyName" type="element()*"/> <IterStep axis="child" test="FullName" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </IterPath> </CElem> </CElem> </DualMap> </GFLWOR> </QueryPlan>
On Fri, Aug 21, 2020 at 1:48 PM Liam R. E. Quin liam@fromoldbooks.org wrote:
On Fri, 2020-08-21 at 12:51 -0700, Bill Osmond wrote:
declare namespace ernm="http://ddex.net/xml/ern/411"; for $r in /ernm:NewReleaseMessage for $track_release in $r/ReleaseList/TrackRelease for $party in $r/PartyList/Party for $sound_recording in $r/ResourceList/SoundRecording for $release in $r/ReleaseList/Release where $track_release/ReleaseLabelReference = $party/PartyReference and $track_release/ReleaseResourceReference =
$sound_recording/ResourceReference and $track_release/ReleaseResourceReference = $release/ResourceGroup/ResourceGroup/ResourceGroupContentItem/Release ResourceReference
BaseX is probably smart enough to rewrite this, but check -
for $r in /ernm:NewReleaseMessage for $track_release in $r/ReleaseList/TrackRelease where $track_release/ReleaseLabelReference = $party/PartyReference
return for $party in $r/PartyList/Party for $sound_recording in $r/ResourceList/SoundRecording where $track_release/ReleaseResourceReference = $sound_recording/ResourceReference return for $release in $r/ReleaseList/Release where $track_release/ReleaseResourceReference = $release/ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseRe sourceReference return <identity>...
Am I wrong, and would an additional value index help here? Or is my query just bad?
You're computing every possible combiation of 5 items and then filtering out the ones you want.
Filtering out earlier would probably help. Also, moving the tests least lilely to match to the outside would reduce the number of tests sooner.
A value index might well help, but as Bridger wrote, check in the gUI to see the query plan. BaseX might already be doing the sort of rewrite i suggested.
Liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org