Thanks for the detailed explanations and the more efficient function, Christian!
Yes, for our other indexes I have only one database representing all repositories. For the text I was concerned one index would be too big, because some of our finding aids are 2,000 pages when printed out,
so I made 47 text indexes for the 47 document databases. But this morning I tested creating a single index, and the size isn't a problem. It is slightly faster, too. There's a minor tradeoff between faster times querying all databases vs faster times querying one database, which is what most of our users want to do, but they're differences of less than a second.
47 databases "text" || $id
Time to query Washington apple farm across all databases: ~2.67 seconds
Time to query Washington apple farm in the largest database: ~0.77 seconds
Time to query Washington apple farm in an average-sized database: ~0.25 seconds
Average documents: 919 (largest repository 7,616; smallest 1)
Average size of index database: 11,306,034
Average full-text index size: 6,367 kB
Average entries: 34,631
XQuery:
for $db_id in tokenize($d, '\|')
for $result score $basex_score in ft:search('text' || $db_id, $terms, map{'mode':'all','fuzzy':$f})/ancestor::ead
let $ark := string($result/@ark) [etc.]
Single database "index-text"
Time to query Washington apple farm across all databases: ~1.75 seconds
Time to query Washington apple farm in the largest database: ~0.94 seconds
Time to query Washington apple farm in an average-sized database: ~0.10 seconds
Represents 44,091 documents
Size of index database: 595,687,360
Full-text index: 344 MB
Entries: 599,405
XQuery:
let $dbs := tokenize($d, '\|')
for $result score $basex_score in ft:search('index-text', $terms, map{'mode':'all','fuzzy':$f})/ancestor::ead
where $result/@db=$dbs
let $ark := string($result/@ark) [etc.]
I also compared moving up the index entries with
/parent::tokens/parent::ead instead of /ancestor::ead and didn't see a difference. I'm sure it
would make a big difference in our original documents with tons of nested
nodes, but not so much in the condensed text index.
-Tamara