Hello,
I have a largish (5.4G) file with a full-text index that I am using to reconcile names in a local dataset. I've been experimenting with splitting the file into many smaller index files to improve performance. I group the entries by initial character and create a new index file for each distinct initial character. Each smaller file then gets its own full-text index.
I've been following the approach outlined in the documentation for
custom index structures. Using prof:track, I've noticed the following performance for different uses of ft:search.
(Here, $db refers to the 5.4G file, and $index refers to a smaller 159MB subindex. Times are averaged across 10 runs of 1000 iterations for each expression.)
1. Direct lookup against large index
Time: 23ms
Expression: ft:search($db, $text)/../..
2. Direct lookup against subindex
Time: 3.3ms
Expression: ft:search($index, $text)/../..
3. Lookup against subindex file with reference to large index
Time: 2.9ms
Expression:
let $s :=
ft:search($index, $text)/../..
return db:open-id($db, $s/id)/../..
My question is: why would the third expression be slightly faster (or at least not slower) than the second one, if it involves additional computation?
Thanks in advance,
Tim
--
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library