Hello,

I have a largish (5.4G) file with a full-text index that I am using to reconcile names in a local dataset. I've been experimenting with splitting the file into many smaller index files to improve performance. I group the entries by initial character and create a new index file for each distinct initial character. Each smaller file then gets its own full-text index.

I've been following the approach outlined in the documentation for custom index structures. Using prof:track, I've noticed the following performance for different uses of ft:search.

(Here, $db refers to the 5.4G file, and $index refers to a smaller 159MB subindex. Times are averaged across 10 runs of 1000 iterations for each expression.)

1. Direct lookup against large index

Time: 23ms

Expression: ft:search($db, $text)/../..

2. Direct lookup against subindex

Time: 3.3ms

Expression: ft:search($index, $text)/../..

3. Lookup against subindex file with reference to large index

Time: 2.9ms

Expression:

let $s :=
ft:search($index, $text)/../..

return db:open-id($db, $s/id)/../..

My question is: why would the third expression be slightly faster (or at least not slower) than the second one, if it involves additional computation?

Thanks in advance,

Tim

--
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library