Hello,

I have a largish (5.4G) file with a full-text index that I am using to reconcile names in a local dataset. I've been experimenting with splitting the file into many smaller index files to improve performance. I group the entries by initial character and create a new index file for each distinct initial character. Each smaller file then gets its own full-text index.

I've been following the approach outlined in the documentation for custom index structures. Using prof:track, I've noticed the following performance for different uses of ft:search.

(Here, $db refers to the 5.4G file, and $index refers to a smaller 159MB subindex. Times are averaged across 10 runs of 1000 iterations for each expression.)

1. Direct lookup against large index
Time: 23ms
Expression: ft:search($db, $text)/../..

2. Direct lookup against subindex
Time: 3.3ms
Expression: ft:search($index, $text)/../..

3. Lookup against subindex file with reference to large index
Time: 2.9ms
Expression:
let $s :=
  ft:search($index, $text)/../..
return db:open-id($db, $s/id)/../..

My question is: why would the third expression be slightly faster (or at least not slower) than the second one, if it involves additional computation?

Thanks in advance,
Tim


--
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library