Dear Tim,
The query plan indicates that no index is applied. Your query may be
evaluated faster when rewriting "BIB_ID" to "BIB_ID/text()". I will
see if this can automatically be done by the query compiler.
Best,
Christian
On Tue, Aug 4, 2015 at 4:42 AM, Tim Thompson <timathom@gmail.com> wrote:
> Hello,
>
> I have a database containing two resources/documents: they both represent
> the same set of library catalog records (7728 "records" in each), but they
> each contain different data that I want to join.
>
> The first resource looks like this:
>
> <marc:collection
> xmlns:marc="http://www.loc.gov/MARC21/slim">
> <marc:record>
> <marc:leader>01225cam a2200373Mi 4500</marc:leader>
> <marc:controlfield tag="001">5323084</marc:controlfield>
> <marc:datafield
> ind1="1"
> ind2="4"
> tag="245">
> <marc:subfield code="a">Els teleclubs a les illes Balears
> :</marc:subfield>
> </marc:datafield>
> </marc:record>
> <marc:record>
> <marc:leader>01225cam a2200373Mi 4500</marc:leader>
> <marc:controlfield tag="001">5323084</marc:controlfield>
> <marc:datafield
> ind1="1"
> ind2="4"
> tag="245">
> <marc:subfield code="a">Els teleclubs a les illes Balears
> :</marc:subfield>
> </marc:datafield>
> </marc:record>
> <marc:record>
> <marc:leader>00818cam a2200241Mi 4500</marc:leader>
> <marc:controlfield tag="001">6310976</marc:controlfield>
> <marc:datafield
> ind1="0"
> ind2="0"
> tag="245">
> <marc:subfield code="a">Diccionari manual de sinònims i
> antònims de la llengua catalana /</marc:subfield>
> </marc:datafield>
> </marc:record>
> </marc:collection>
>
> The second one looks like this:
>
> <root>
> <row>
> <LANGUAGE>cat</LANGUAGE>
> <ITEM_ID>5912416</ITEM_ID>
> <BIB_ID>5323084</BIB_ID>
> <VENDOR_CODE>MXBKSMX</VENDOR_CODE>
> </row>
> <row>
> <LANGUAGE>cat</LANGUAGE>
> <ITEM_ID>5912416</ITEM_ID>
> <BIB_ID>5323084</BIB_ID>
> <VENDOR_CODE>PUVILL</VENDOR_CODE>
> </row>
> <row>
> <LANGUAGE>cat</LANGUAGE>
> <ITEM_ID>5935043</ITEM_ID>
> <BIB_ID>6310976</BIB_ID>
> <VENDOR_CODE>PUVILL</VENDOR_CODE>
> </row>
> </root>
>
> I have a simple query that joins the two using the value of the
> marc:controlfield[@tag = '001'] from resource 1 and the BIB_ID from resource
> 2.
>
> The query:
>
> declare namespace marc="http://www.loc.gov/MARC21/slim";
>
> for $m in collection(
> "latin_hold_20150730"
> )/marc:collection/marc:record,
> $r in collection(
> "latin_hold_20150730"
> )/root/row[BIB_ID = $m/marc:controlfield[@tag = '001']]
> group by $key := $r/ITEM_ID
>
> return
> <test n="{
> $key
> }">{
> $m/marc:datafield[@tag = '245']/marc:subfield[@code = 'a']/string()
> }</test>
>
> When I run this using Saxon (substituting fn:doc for fn:collection), it only
> takes a second to execute. In BaseX 8.2.3 (using the GUI), however, it takes
> a very long time to execute (around 6 minutes!). The BaseX database has both
> attribute and text indexes enabled.
>
> Any idea what is causing it to take so long?
>
> Here is the BaseX Query Info:
>
> Total Time: 383756.3 ms
>
> Compiling:
> - pre-evaluating collection("latin_hold_20150730")
> - pre-evaluating collection("latin_hold_20150730")
> Query:
> declare namespace marc="http://www.loc.gov/MARC21/slim"; for $m in
> collection( "latin_hold_20150730" )/marc:collection/marc:record, $r in
> collection( "latin_hold_20150730" )/root/row[BIB_ID =
> $m/marc:controlfield[@tag = '001']] group by $key := $r/ITEM_ID return <test
> n="{ $key }">{ $m/marc:datafield[@tag = '245']/marc:subfield[@code =
> 'a']/string() }</test>
> Optimized Query:
> for $m_0 in (db:open-pre("latin_hold_20150730",0),
> ...)/marc:collection/marc:record for $r_1 in
> (db:open-pre("latin_hold_20150730",0), ...)/root/row[(BIB_ID =
> $m_0/marc:controlfield[(@tag = "001")])] let (: post-group :) $m_4 := $m_0
> group by $key_2 := $r_1/ITEM_ID return element test { (attribute n {
> ($key_2) }, $m_4/marc:datafield[(@tag = "245")]/marc:subfield[(@code =
> "a")]/string()) }
> Result:
> - Hit(s): 7587 Items
> - Updated: 0 Items
> - Printed: 505 KB
> - Read Locking: local [latin_hold_20150730]
> - Write Locking: none
> Timing:
> - Parsing: 0.43 ms
> - Compiling: 1.96 ms
> - Evaluating: 383737.5 ms
> - Printing: 16.41 ms
> - Total Time: 383756.3 ms
> Query plan:
> <QueryPlan compiled="true">
> <GFLWOR>
> <For>
> <Var name="$m" id="0"/>
> <IterPath>
> <DBNodeSeq size="2">
> <DBNode name="latin_hold_20150730" pre="0"/>
> <DBNode name="latin_hold_20150730" pre="365692"/>
> </DBNodeSeq>
> <IterStep axis="child" test="marc:collection"/>
> <IterStep axis="child" test="marc:record"/>
> </IterPath>
> </For>
> <For>
> <Var name="$r" id="1"/>
> <IterPath>
> <DBNodeSeq size="2">
> <DBNode name="latin_hold_20150730" pre="0"/>
> <DBNode name="latin_hold_20150730" pre="365692"/>
> </DBNodeSeq>
> <IterStep axis="child" test="root"/>
> <IterStep axis="child" test="row">
> <CmpG op="=">
> <CachedPath>
> <IterStep axis="child" test="BIB_ID"/>
> </CachedPath>
> <IterPath>
> <VarRef>
> <Var name="$m" id="0"/>
> </VarRef>
> <IterStep axis="child" test="marc:controlfield">
> <CmpG op="=">
> <CachedPath>
> <IterStep axis="attribute" test="tag"/>
> </CachedPath>
> <Str value="001" type="xs:string"/>
> </CmpG>
> </IterStep>
> </IterPath>
> </CmpG>
> </IterStep>
> </IterPath>
> </For>
> <GroupBy>
> <Spec>
> <Var name="$key" id="2"/>
> <IterPath>
> <VarRef>
> <Var name="$r" id="1"/>
> </VarRef>
> <IterStep axis="child" test="ITEM_ID"/>
> </IterPath>
> </Spec>
> </GroupBy>
> <CElem>
> <QNm value="test" type="xs:QName"/>
> <CAttr>
> <QNm value="n" type="xs:QName"/>
> <VarRef>
> <Var name="$key" id="2"/>
> </VarRef>
> </CAttr>
> <MixedPath>
> <VarRef>
> <Var name="$m" id="4"/>
> </VarRef>
> <IterStep axis="child" test="marc:datafield">
> <CmpG op="=">
> <CachedPath>
> <IterStep axis="attribute" test="tag"/>
> </CachedPath>
> <Str value="245" type="xs:string"/>
> </CmpG>
> </IterStep>
> <IterStep axis="child" test="marc:subfield">
> <CmpG op="=">
> <CachedPath>
> <IterStep axis="attribute" test="code"/>
> </CachedPath>
> <Str value="a" type="xs:string"/>
> </CmpG>
> </IterStep>
> <FnString name="string([item])"/>
> </MixedPath>
> </CElem>
> </GFLWOR>
> </QueryPlan>
>
>
> Thanks in advance!
>
> Tim
>
>
> --
> Tim A. Thompson
> Metadata Librarian (Spanish/Portuguese Specialty)
> Princeton University Library
>