Just a quick hint: As the query info output indicates that no text index is used (see [1]), you could try to attach explicit text() steps in your comparisons:
where $track_release/ReleaseLabelReference/text() = $r/PartyList/Party/PartyReference/text() ... where $track_release/ReleaseResourceReference/text() = $sound_recording/ResourceReference/text() ...
Feel free to provide us with a minimized XML document that allows us to run your query.
[1] https://docs.basex.org/wiki/Indexes
On Fri, Aug 21, 2020 at 11:26 PM Bill Osmond bill@with.in wrote:
The indentation levels and multiple returns in your suggested query alone has helped me in understanding what's actually going on, and I've rewritten the query mostly as you suggest:
for $r in /ernm:NewReleaseMessage for $track_release in $r/ReleaseList/TrackRelease where $track_release/ReleaseLabelReference = $r/PartyList/Party/PartyReference return for $party in $r/PartyList/Party for $sound_recording in $r/ResourceList/SoundRecording where $track_release/ReleaseResourceReference = $sound_recording/ResourceReference return for $release in $r/ReleaseList/Release where $track_release/ReleaseResourceReference = $release/ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseResourceReference return
<identity> <isrc>{ $track_release/ReleaseId/ISRC/text() }</isrc> <artist>{ fn:string-join($sound_recording/DisplayArtistName, '/') }</artist> <title>{ $sound_recording/DisplayTitleText/text() }</title> <album>{ $release/DisplayTitleText/text() }</album> <icpn>{ $release/ReleaseId/ICPN/text() }</icpn> <sublabel>{ $party/PartyName/FullName/text() }</sublabel> </identity>
I say "mostly" because at the first 'where' statement, I had to replace the $party/PartyReference on the right side of the = with an explicit reference from $r, because $party isn't defined until 2 lines further down. The query does work and appears to be giving the right results, however the performance is basically the same: 127,698.27 ms for this run vs. 114,935.22 ms for the original. I'll try shuffling things around next!
On Fri, Aug 21, 2020 at 1:52 PM Bill Osmond bill@with.in wrote:
Thank you both! I was in the middle of typing a response to Bridger when this came through, so while I digest both of your messages & test index addition/moving things around, I'll leave the output from the GUI info panel here - if I knew how to read it, it would no doubt point straight at the issue! I'll note that it runs much quicker locally as opposed to on server+via the python BaseXClient, but I don't know if that's because the GUI is stopping at 500k results, or what.
Compiling:
- rewrite context value to document-node() sequence: . -> (db:open-pre("umg-118061851424", 0), ...)
- rewrite util:root(nodes) to document-node() sequence: util:root((db:open-pre("umg-118061851424", 0), ...)) -> (db:open-pre("umg-118061851424", 0), ...)
- move where clause: $track_release_1/ReleaseLabelReference = $party_2/PartyReference
- swap operands: PartyReference = $track_release_1/ReleaseLabelReference
- rewrite to predicate: PartyReference = $track_release_1/ReleaseLabelReference
- move where clause: $track_release_1/ReleaseResourceReference = $sound_recording_3/ResourceReference
- swap operands: ResourceReference = $track_release_1/ReleaseResourceReference
- rewrite to predicate: ResourceReference = $track_release_1/ReleaseResourceReference
- swap operands: ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseResourceReference = $track_release_1/Rel...
- rewrite to predicate: ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseResourceReference = $track_release_1/Rel...
- inline for $release_4 in $r_0/ReleaseList/Release[ResourceGroup/ResourceGroup/ResourceGroupContentItem/Rele...
Optimized Query: for $r_0 in (db:open-pre("umg-118061851424", 0), ...)/ernm:NewReleaseMessage for $track_release_1 in $r_0/ReleaseList/TrackRelease for $party_2 in $r_0/PartyList/Party[PartyReference = $track_release_1/ReleaseLabelReference] for $sound_recording_3 in $r_0/ResourceList/SoundRecording[ResourceReference = $track_release_1/ReleaseResourceReference] return $r_0/ReleaseList/Release[ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseResourceReference = $track_release_1/ReleaseResourceReference] ! <identity> <isrc> { $track_release_1/ReleaseId/ISRC/text() } </isrc> <artist> { string-join($sound_recording_3/DisplayArtistName, "/") } </artist> <title> { $sound_recording_3/DisplayTitleText/text() } </title> <album> { DisplayTitleText/text() } </album> <icpn> { ReleaseId/ICPN/text() } </icpn> <sublabel> { $party_2/PartyName/FullName/text() } </sublabel> </identity>
Query: declare namespace ernm="http://ddex.net/xml/ern/411"; for $r in /ernm:NewReleaseMessage for $track_release in $r/ReleaseList/TrackRelease for $party in $r/PartyList/Party for $sound_recording in $r/ResourceList/SoundRecording for $release in $r/ReleaseList/Release where $track_release/ReleaseLabelReference = $party/PartyReference and $track_release/ReleaseResourceReference = $sound_recording/ResourceReference and $track_release/ReleaseResourceReference = $release/ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseResourceReference return <identity> <isrc>{ $track_release/ReleaseId/ISRC/text() }</isrc> <artist>{ fn:string-join($sound_recording/DisplayArtistName, '/') }</artist> <title>{ $sound_recording/DisplayTitleText/text() }</title> <album>{ $release/DisplayTitleText/text() }</album> <icpn>{ $release/ReleaseId/ICPN/text() }</icpn> <sublabel>{ $party/PartyName/FullName/text() }</sublabel> </identity>
Result:
- Hit(s): 500000 Items
- Updated: 0 Items
- Printed: 8192 kB
- Read Locking: umg-118061851424
- Write Locking: (none)
Timing:
- Parsing: 2.88 ms
- Compiling: 4.38 ms
- Evaluating: 114935.22 ms
- Printing: 114.59 ms
- Total Time: 115057.07 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <GFLWOR type="element()*"> <For type="element()" size="1" name="$r" id="0"> <IterPath type="element()*" database="umg-118061851424"> <DBNodeSeq type="document-node()+" size="136938" database="umg-118061851424"> <DBNode pre="0" type="document-node()" size="1" database="umg-118061851424"/> <DBNode pre="59" type="document-node()" size="1" database="umg-118061851424"/> <DBNode pre="118" type="document-node()" size="1" database="umg-118061851424"/> <DBNode pre="177" type="document-node()" size="1" database="umg-118061851424"/> <DBNode pre="236" type="document-node()" size="1" database="umg-118061851424"/> </DBNodeSeq> <IterStep axis="child" test="ernm:NewReleaseMessage" type="element()*"/> </IterPath> </For> <For type="element()" size="1" name="$track_release" id="1"> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$r" id="0"/> <IterStep axis="child" test="ReleaseList" type="element()*"/> <IterStep axis="child" test="TrackRelease" type="element()*"/> </IterPath> </For> <For type="element()" size="1" name="$party" id="2"> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$r" id="0"/> <IterStep axis="child" test="PartyList" type="element()*"/> <IterStep axis="child" test="Party" type="element()*"> <CmpHashG op="=" type="xs:boolean" size="1"> <SingleIterPath type="element()*" database="umg-118061851424"> <IterStep axis="child" test="PartyReference" type="element()*"/> </SingleIterPath> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$track_release" id="1"/> <IterStep axis="child" test="ReleaseLabelReference" type="element()*"/> </IterPath> </CmpHashG> </IterStep> </IterPath> </For> <For type="element()" size="1" name="$sound_recording" id="3"> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$r" id="0"/> <IterStep axis="child" test="ResourceList" type="element()*"/> <IterStep axis="child" test="SoundRecording" type="element()*"> <CmpHashG op="=" type="xs:boolean" size="1"> <SingleIterPath type="element()*" database="umg-118061851424"> <IterStep axis="child" test="ResourceReference" type="element()*"/> </SingleIterPath> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$track_release" id="1"/> <IterStep axis="child" test="ReleaseResourceReference" type="element()*"/> </IterPath> </CmpHashG> </IterStep> </IterPath> </For> <DualMap type="element()*"> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$r" id="0"/> <IterStep axis="child" test="ReleaseList" type="element()*"/> <IterStep axis="child" test="Release" type="element()*"> <CmpHashG op="=" type="xs:boolean" size="1"> <IterPath type="element()*" database="umg-118061851424"> <IterStep axis="child" test="ResourceGroup" type="element()*"/> <IterStep axis="child" test="ResourceGroup" type="element()*"/> <IterStep axis="child" test="ResourceGroupContentItem" type="element()*"/> <IterStep axis="child" test="ReleaseResourceReference" type="element()*"/> </IterPath> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$track_release" id="1"/> <IterStep axis="child" test="ReleaseResourceReference" type="element()*"/> </IterPath> </CmpHashG> </IterStep> </IterPath> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">identity</QNm> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">isrc</QNm> <IterPath type="text()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$track_release" id="1"/> <IterStep axis="child" test="ReleaseId" type="element()*"/> <IterStep axis="child" test="ISRC" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </IterPath> </CElem> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">artist</QNm> <FnStringJoin name="string-join" type="xs:string" size="1"> <IterPath type="element()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$sound_recording" id="3"/> <IterStep axis="child" test="DisplayArtistName" type="element()*"/> </IterPath> <Str type="xs:string" size="1">/</Str> </FnStringJoin> </CElem> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">title</QNm> <IterPath type="text()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$sound_recording" id="3"/> <IterStep axis="child" test="DisplayTitleText" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </IterPath> </CElem> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">album</QNm> <IterPath type="text()*" database="umg-118061851424"> <IterStep axis="child" test="DisplayTitleText" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </IterPath> </CElem> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">icpn</QNm> <IterPath type="text()*" database="umg-118061851424"> <IterStep axis="child" test="ReleaseId" type="element()*"/> <IterStep axis="child" test="ICPN" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </IterPath> </CElem> <CElem type="element()" size="1"> <QNm type="xs:QName" size="1">sublabel</QNm> <IterPath type="text()*" database="umg-118061851424"> <VarRef type="element()" size="1" database="umg-118061851424" name="$party" id="2"/> <IterStep axis="child" test="PartyName" type="element()*"/> <IterStep axis="child" test="FullName" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </IterPath> </CElem> </CElem> </DualMap> </GFLWOR> </QueryPlan>
On Fri, Aug 21, 2020 at 1:48 PM Liam R. E. Quin liam@fromoldbooks.org wrote:
On Fri, 2020-08-21 at 12:51 -0700, Bill Osmond wrote:
declare namespace ernm="http://ddex.net/xml/ern/411"; for $r in /ernm:NewReleaseMessage for $track_release in $r/ReleaseList/TrackRelease for $party in $r/PartyList/Party for $sound_recording in $r/ResourceList/SoundRecording for $release in $r/ReleaseList/Release where $track_release/ReleaseLabelReference = $party/PartyReference and $track_release/ReleaseResourceReference =
$sound_recording/ResourceReference and $track_release/ReleaseResourceReference = $release/ResourceGroup/ResourceGroup/ResourceGroupContentItem/Release ResourceReference
BaseX is probably smart enough to rewrite this, but check -
for $r in /ernm:NewReleaseMessage for $track_release in $r/ReleaseList/TrackRelease where $track_release/ReleaseLabelReference = $party/PartyReference
return for $party in $r/PartyList/Party for $sound_recording in $r/ResourceList/SoundRecording where $track_release/ReleaseResourceReference = $sound_recording/ResourceReference return for $release in $r/ReleaseList/Release where $track_release/ReleaseResourceReference = $release/ResourceGroup/ResourceGroup/ResourceGroupContentItem/ReleaseRe sourceReference return <identity>...
Am I wrong, and would an additional value index help here? Or is my query just bad?
You're computing every possible combiation of 5 items and then filtering out the ones you want.
Filtering out earlier would probably help. Also, moving the tests least lilely to match to the outside would reduce the number of tests sooner.
A value index might well help, but as Bridger wrote, check in the gUI to see the query plan. BaseX might already be doing the sort of rewrite i suggested.
Liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org