Hello,
I have a database containing two resources/documents: they both represent the same set of library catalog records (7728 "records" in each), but they each contain different data that I want to join.
The first resource looks like this:
<marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim%22%3E marc:record marc:leader01225cam a2200373Mi 4500</marc:leader> <marc:controlfield tag="001">5323084</marc:controlfield> <marc:datafield ind1="1" ind2="4" tag="245"> <marc:subfield code="a">Els teleclubs a les illes Balears :</marc:subfield> </marc:datafield> </marc:record> marc:record marc:leader01225cam a2200373Mi 4500</marc:leader> <marc:controlfield tag="001">5323084</marc:controlfield> <marc:datafield ind1="1" ind2="4" tag="245"> <marc:subfield code="a">Els teleclubs a les illes Balears :</marc:subfield> </marc:datafield> </marc:record> marc:record marc:leader00818cam a2200241Mi 4500</marc:leader> <marc:controlfield tag="001">6310976</marc:controlfield> <marc:datafield ind1="0" ind2="0" tag="245"> <marc:subfield code="a">Diccionari manual de sinònims i antònims de la llengua catalana /</marc:subfield> </marc:datafield> </marc:record> </marc:collection>
The second one looks like this:
<root> <row> <LANGUAGE>cat</LANGUAGE> <ITEM_ID>5912416</ITEM_ID> <BIB_ID>5323084</BIB_ID> <VENDOR_CODE>MXBKSMX</VENDOR_CODE> </row> <row> <LANGUAGE>cat</LANGUAGE> <ITEM_ID>5912416</ITEM_ID> <BIB_ID>5323084</BIB_ID> <VENDOR_CODE>PUVILL</VENDOR_CODE> </row> <row> <LANGUAGE>cat</LANGUAGE> <ITEM_ID>5935043</ITEM_ID> <BIB_ID>6310976</BIB_ID> <VENDOR_CODE>PUVILL</VENDOR_CODE> </row> </root>
I have a simple query that joins the two using the value of the marc:controlfield[@tag = '001'] from resource 1 and the BIB_ID from resource 2.
The query:
declare namespace marc="http://www.loc.gov/MARC21/slim";
for $m in collection( "latin_hold_20150730" )/marc:collection/marc:record, $r in collection( "latin_hold_20150730" )/root/row[BIB_ID = $m/marc:controlfield[@tag = '001']] group by $key := $r/ITEM_ID
return <test n="{ $key }">{ $m/marc:datafield[@tag = '245']/marc:subfield[@code = 'a']/string() }</test>
When I run this using Saxon (substituting fn:doc for fn:collection), it only takes a second to execute. In BaseX 8.2.3 (using the GUI), however, it takes a very long time to execute (around 6 minutes!). The BaseX database has both attribute and text indexes enabled.
Any idea what is causing it to take so long?
Here is the BaseX Query Info:
Total Time: 383756.3 ms
Compiling: - pre-evaluating collection("latin_hold_20150730") - pre-evaluating collection("latin_hold_20150730") Query: declare namespace marc="http://www.loc.gov/MARC21/slim"; for $m in collection( "latin_hold_20150730" )/marc:collection/marc:record, $r in collection( "latin_hold_20150730" )/root/row[BIB_ID = $m/marc:controlfield[@tag = '001']] group by $key := $r/ITEM_ID return <test n="{ $key }">{ $m/marc:datafield[@tag = '245']/marc:subfield[@code = 'a']/string() }</test> Optimized Query: for $m_0 in (db:open-pre("latin_hold_20150730",0), ...)/marc:collection/marc:record for $r_1 in (db:open-pre("latin_hold_20150730",0), ...)/root/row[(BIB_ID = $m_0/marc:controlfield[(@tag = "001")])] let (: post-group :) $m_4 := $m_0 group by $key_2 := $r_1/ITEM_ID return element test { (attribute n { ($key_2) }, $m_4/marc:datafield[(@tag = "245")]/marc:subfield[(@code = "a")]/string()) } Result: - Hit(s): 7587 Items - Updated: 0 Items - Printed: 505 KB - Read Locking: local [latin_hold_20150730] - Write Locking: none Timing: - Parsing: 0.43 ms - Compiling: 1.96 ms - Evaluating: 383737.5 ms - Printing: 16.41 ms - Total Time: 383756.3 ms Query plan: <QueryPlan compiled="true"> <GFLWOR> <For> <Var name="$m" id="0"/> <IterPath> <DBNodeSeq size="2"> <DBNode name="latin_hold_20150730" pre="0"/> <DBNode name="latin_hold_20150730" pre="365692"/> </DBNodeSeq> <IterStep axis="child" test="marc:collection"/> <IterStep axis="child" test="marc:record"/> </IterPath> </For> <For> <Var name="$r" id="1"/> <IterPath> <DBNodeSeq size="2"> <DBNode name="latin_hold_20150730" pre="0"/> <DBNode name="latin_hold_20150730" pre="365692"/> </DBNodeSeq> <IterStep axis="child" test="root"/> <IterStep axis="child" test="row"> <CmpG op="="> <CachedPath> <IterStep axis="child" test="BIB_ID"/> </CachedPath> <IterPath> <VarRef> <Var name="$m" id="0"/> </VarRef> <IterStep axis="child" test="marc:controlfield"> <CmpG op="="> <CachedPath> <IterStep axis="attribute" test="tag"/> </CachedPath> <Str value="001" type="xs:string"/> </CmpG> </IterStep> </IterPath> </CmpG> </IterStep> </IterPath> </For> <GroupBy> <Spec> <Var name="$key" id="2"/> <IterPath> <VarRef> <Var name="$r" id="1"/> </VarRef> <IterStep axis="child" test="ITEM_ID"/> </IterPath> </Spec> </GroupBy> <CElem> <QNm value="test" type="xs:QName"/> <CAttr> <QNm value="n" type="xs:QName"/> <VarRef> <Var name="$key" id="2"/> </VarRef> </CAttr> <MixedPath> <VarRef> <Var name="$m" id="4"/> </VarRef> <IterStep axis="child" test="marc:datafield"> <CmpG op="="> <CachedPath> <IterStep axis="attribute" test="tag"/> </CachedPath> <Str value="245" type="xs:string"/> </CmpG> </IterStep> <IterStep axis="child" test="marc:subfield"> <CmpG op="="> <CachedPath> <IterStep axis="attribute" test="code"/> </CachedPath> <Str value="a" type="xs:string"/> </CmpG> </IterStep> <FnString name="string([item])"/> </MixedPath> </CElem> </GFLWOR> </QueryPlan>
Thanks in advance!
Tim
-- Tim A. Thompson Metadata Librarian (Spanish/Portuguese Specialty) Princeton University Library