I was mistaken; I hadn't enabled indexes on this database. However, after creating attribute and text indexes, the query actually seems to take longer to execute (483622.95 ms on last run), although the query plan itself doesn't seem to have changed:

Compiling:
- pre-evaluating collection("latin_hold_20150730")
- pre-evaluating collection("latin_hold_20150730")
Query:
declare namespace marc="http://www.loc.gov/MARC21/slim"; for $m in collection( "latin_hold_20150730" )/marc:collection/marc:record, $r in collection( "latin_hold_20150730" )/root/row[BIB_ID = $m/marc:controlfield[@tag = '001']] group by $key := $r/ITEM_ID return <test n="{ $key }">{ $m/marc:datafield[@tag = '245']/marc:subfield[@code = 'a']/string() }</test>
Optimized Query:
for $m_0 in (db:open-pre("latin_hold_20150730",0), ...)/marc:collection/marc:record for $r_1 in (db:open-pre("latin_hold_20150730",0), ...)/root/row[(BIB_ID = $m_0/marc:controlfield[(@tag = "001")])] let (: post-group :) $m_4 := $m_0 group by $key_2 := $r_1/ITEM_ID return element test { (attribute n { ($key_2) }, $m_4/marc:datafield[(@tag = "245")]/marc:subfield[(@code = "a")]/string()) }
Result:
- Hit(s): 7587 Items
- Updated: 0 Items
- Printed: 505 KB
- Read Locking: local [latin_hold_20150730]
- Write Locking: none
Timing:
- Parsing: 1.04 ms
- Compiling: 0.86 ms
- Evaluating: 483604.85 ms
- Printing: 16.2 ms
- Total Time: 483622.95 ms
Query plan:
<QueryPlan compiled="true">
  <GFLWOR>
    <For>
      <Var name="$m" id="0"/>
      <IterPath>
        <DBNodeSeq size="2">
          <DBNode name="latin_hold_20150730" pre="0"/>
          <DBNode name="latin_hold_20150730" pre="365692"/>
        </DBNodeSeq>
        <IterStep axis="child" test="marc:collection"/>
        <IterStep axis="child" test="marc:record"/>
      </IterPath>
    </For>
    <For>
      <Var name="$r" id="1"/>
      <IterPath>
        <DBNodeSeq size="2">
          <DBNode name="latin_hold_20150730" pre="0"/>
          <DBNode name="latin_hold_20150730" pre="365692"/>
        </DBNodeSeq>
        <IterStep axis="child" test="root"/>
        <IterStep axis="child" test="row">
          <CmpG op="=">
            <CachedPath>
              <IterStep axis="child" test="BIB_ID"/>
            </CachedPath>
            <IterPath>
              <VarRef>
                <Var name="$m" id="0"/>
              </VarRef>
              <IterStep axis="child" test="marc:controlfield">
                <CmpG op="=">
                  <CachedPath>
                    <IterStep axis="attribute" test="tag"/>
                  </CachedPath>
                  <Str value="001" type="xs:string"/>
                </CmpG>
              </IterStep>
            </IterPath>
          </CmpG>
        </IterStep>
      </IterPath>
    </For>
    <GroupBy>
      <Spec>
        <Var name="$key" id="2"/>
        <IterPath>
          <VarRef>
            <Var name="$r" id="1"/>
          </VarRef>
          <IterStep axis="child" test="ITEM_ID"/>
        </IterPath>
      </Spec>
    </GroupBy>
    <CElem>
      <QNm value="test" type="xs:QName"/>
      <CAttr>
        <QNm value="n" type="xs:QName"/>
        <VarRef>
          <Var name="$key" id="2"/>
        </VarRef>
      </CAttr>
      <MixedPath>
        <VarRef>
          <Var name="$m" id="4"/>
        </VarRef>
        <IterStep axis="child" test="marc:datafield">
          <CmpG op="=">
            <CachedPath>
              <IterStep axis="attribute" test="tag"/>
            </CachedPath>
            <Str value="245" type="xs:string"/>
          </CmpG>
        </IterStep>
        <IterStep axis="child" test="marc:subfield">
          <CmpG op="=">
            <CachedPath>
              <IterStep axis="attribute" test="code"/>
            </CachedPath>
            <Str value="a" type="xs:string"/>
          </CmpG>
        </IterStep>
        <FnString name="string([item])"/>
      </MixedPath>
    </CElem>
  </GFLWOR>
</QueryPlan>


--
Tim A. Thompson
Metadata Librarian (Spanish/Portuguese Specialty)
Princeton University Library


On Tue, Aug 4, 2015 at 3:23 AM, Christian Grün <christian.gruen@gmail.com> wrote:
Dear Tim,

The query plan indicates that no index is applied. Your query may be
evaluated faster when rewriting "BIB_ID" to "BIB_ID/text()". I will
see if this can automatically be done by the query compiler.

Best,
Christian


On Tue, Aug 4, 2015 at 4:42 AM, Tim Thompson <timathom@gmail.com> wrote:
> Hello,
>
> I have a database containing two resources/documents: they both represent
> the same set of library catalog records (7728 "records" in each), but they
> each contain different data that I want to join.
>
> The first resource looks like this:
>
>         <marc:collection
>         xmlns:marc="http://www.loc.gov/MARC21/slim">
>         <marc:record>
>             <marc:leader>01225cam a2200373Mi 4500</marc:leader>
>             <marc:controlfield tag="001">5323084</marc:controlfield>
>             <marc:datafield
>                 ind1="1"
>                 ind2="4"
>                 tag="245">
>                 <marc:subfield code="a">Els teleclubs a les illes Balears
> :</marc:subfield>
>             </marc:datafield>
>         </marc:record>
>         <marc:record>
>             <marc:leader>01225cam a2200373Mi 4500</marc:leader>
>             <marc:controlfield tag="001">5323084</marc:controlfield>
>             <marc:datafield
>                 ind1="1"
>                 ind2="4"
>                 tag="245">
>                 <marc:subfield code="a">Els teleclubs a les illes Balears
> :</marc:subfield>
>             </marc:datafield>
>         </marc:record>
>         <marc:record>
>             <marc:leader>00818cam a2200241Mi 4500</marc:leader>
>             <marc:controlfield tag="001">6310976</marc:controlfield>
>             <marc:datafield
>                 ind1="0"
>                 ind2="0"
>                 tag="245">
>                 <marc:subfield code="a">Diccionari manual de sinònims i
> antònims de la llengua catalana /</marc:subfield>
>             </marc:datafield>
>         </marc:record>
>     </marc:collection>
>
> The second one looks like this:
>
>         <root>
>         <row>
>             <LANGUAGE>cat</LANGUAGE>
>             <ITEM_ID>5912416</ITEM_ID>
>             <BIB_ID>5323084</BIB_ID>
>             <VENDOR_CODE>MXBKSMX</VENDOR_CODE>
>         </row>
>         <row>
>             <LANGUAGE>cat</LANGUAGE>
>             <ITEM_ID>5912416</ITEM_ID>
>             <BIB_ID>5323084</BIB_ID>
>             <VENDOR_CODE>PUVILL</VENDOR_CODE>
>         </row>
>         <row>
>             <LANGUAGE>cat</LANGUAGE>
>             <ITEM_ID>5935043</ITEM_ID>
>             <BIB_ID>6310976</BIB_ID>
>             <VENDOR_CODE>PUVILL</VENDOR_CODE>
>         </row>
>     </root>
>
> I have a simple query that joins the two using the value of the
> marc:controlfield[@tag = '001'] from resource 1 and the BIB_ID from resource
> 2.
>
> The query:
>
> declare namespace marc="http://www.loc.gov/MARC21/slim";
>
> for $m in collection(
>   "latin_hold_20150730"
> )/marc:collection/marc:record,
> $r in collection(
>   "latin_hold_20150730"
> )/root/row[BIB_ID = $m/marc:controlfield[@tag = '001']]
> group by $key := $r/ITEM_ID
>
> return
> <test n="{
>   $key
> }">{
>     $m/marc:datafield[@tag = '245']/marc:subfield[@code = 'a']/string()
> }</test>
>
> When I run this using Saxon (substituting fn:doc for fn:collection), it only
> takes a second to execute. In BaseX 8.2.3 (using the GUI), however, it takes
> a very long time to execute (around 6 minutes!). The BaseX database has both
> attribute and text indexes enabled.
>
> Any idea what is causing it to take so long?
>
> Here is the BaseX Query Info:
>
> Total Time: 383756.3 ms
>
> Compiling:
> - pre-evaluating collection("latin_hold_20150730")
> - pre-evaluating collection("latin_hold_20150730")
> Query:
> declare namespace marc="http://www.loc.gov/MARC21/slim"; for $m in
> collection( "latin_hold_20150730" )/marc:collection/marc:record, $r in
> collection( "latin_hold_20150730" )/root/row[BIB_ID =
> $m/marc:controlfield[@tag = '001']] group by $key := $r/ITEM_ID return <test
> n="{ $key }">{ $m/marc:datafield[@tag = '245']/marc:subfield[@code =
> 'a']/string() }</test>
> Optimized Query:
> for $m_0 in (db:open-pre("latin_hold_20150730",0),
> ...)/marc:collection/marc:record for $r_1 in
> (db:open-pre("latin_hold_20150730",0), ...)/root/row[(BIB_ID =
> $m_0/marc:controlfield[(@tag = "001")])] let (: post-group :) $m_4 := $m_0
> group by $key_2 := $r_1/ITEM_ID return element test { (attribute n {
> ($key_2) }, $m_4/marc:datafield[(@tag = "245")]/marc:subfield[(@code =
> "a")]/string()) }
> Result:
> - Hit(s): 7587 Items
> - Updated: 0 Items
> - Printed: 505 KB
> - Read Locking: local [latin_hold_20150730]
> - Write Locking: none
> Timing:
> - Parsing: 0.43 ms
> - Compiling: 1.96 ms
> - Evaluating: 383737.5 ms
> - Printing: 16.41 ms
> - Total Time: 383756.3 ms
> Query plan:
> <QueryPlan compiled="true">
>   <GFLWOR>
>     <For>
>       <Var name="$m" id="0"/>
>       <IterPath>
>         <DBNodeSeq size="2">
>           <DBNode name="latin_hold_20150730" pre="0"/>
>           <DBNode name="latin_hold_20150730" pre="365692"/>
>         </DBNodeSeq>
>         <IterStep axis="child" test="marc:collection"/>
>         <IterStep axis="child" test="marc:record"/>
>       </IterPath>
>     </For>
>     <For>
>       <Var name="$r" id="1"/>
>       <IterPath>
>         <DBNodeSeq size="2">
>           <DBNode name="latin_hold_20150730" pre="0"/>
>           <DBNode name="latin_hold_20150730" pre="365692"/>
>         </DBNodeSeq>
>         <IterStep axis="child" test="root"/>
>         <IterStep axis="child" test="row">
>           <CmpG op="=">
>             <CachedPath>
>               <IterStep axis="child" test="BIB_ID"/>
>             </CachedPath>
>             <IterPath>
>               <VarRef>
>                 <Var name="$m" id="0"/>
>               </VarRef>
>               <IterStep axis="child" test="marc:controlfield">
>                 <CmpG op="=">
>                   <CachedPath>
>                     <IterStep axis="attribute" test="tag"/>
>                   </CachedPath>
>                   <Str value="001" type="xs:string"/>
>                 </CmpG>
>               </IterStep>
>             </IterPath>
>           </CmpG>
>         </IterStep>
>       </IterPath>
>     </For>
>     <GroupBy>
>       <Spec>
>         <Var name="$key" id="2"/>
>         <IterPath>
>           <VarRef>
>             <Var name="$r" id="1"/>
>           </VarRef>
>           <IterStep axis="child" test="ITEM_ID"/>
>         </IterPath>
>       </Spec>
>     </GroupBy>
>     <CElem>
>       <QNm value="test" type="xs:QName"/>
>       <CAttr>
>         <QNm value="n" type="xs:QName"/>
>         <VarRef>
>           <Var name="$key" id="2"/>
>         </VarRef>
>       </CAttr>
>       <MixedPath>
>         <VarRef>
>           <Var name="$m" id="4"/>
>         </VarRef>
>         <IterStep axis="child" test="marc:datafield">
>           <CmpG op="=">
>             <CachedPath>
>               <IterStep axis="attribute" test="tag"/>
>             </CachedPath>
>             <Str value="245" type="xs:string"/>
>           </CmpG>
>         </IterStep>
>         <IterStep axis="child" test="marc:subfield">
>           <CmpG op="=">
>             <CachedPath>
>               <IterStep axis="attribute" test="code"/>
>             </CachedPath>
>             <Str value="a" type="xs:string"/>
>           </CmpG>
>         </IterStep>
>         <FnString name="string([item])"/>
>       </MixedPath>
>     </CElem>
>   </GFLWOR>
> </QueryPlan>
>
>
> Thanks in advance!
>
> Tim
>
>
> --
> Tim A. Thompson
> Metadata Librarian (Spanish/Portuguese Specialty)
> Princeton University Library
>