[basex-talk] Performance of ft:search function

27 Apr 2022


      Hello,
I have a largish (5.4G) file with a full-text index that I am using to
reconcile names in a local dataset. I've been experimenting with splitting
the file into many smaller index files to improve performance. I group the
entries by initial character and create a new index file for each distinct
initial character. Each smaller file then gets its own full-text index.
I've been following the approach outlined in the documentation for custom
index structures
https://docs.basex.org/wiki/Indexes#Custom_Index_Structures. Using
prof:track, I've noticed the following performance for different uses of
ft:search.
(Here, $db refers to the 5.4G file, and $index refers to a smaller 159MB
subindex. Times are averaged across 10 runs of 1000 iterations for each
expression.)
1. Direct lookup against large index
Time: 23ms
Expression: ft:search($db, $text)/../..
2. Direct lookup against subindex
Time: 3.3ms
Expression: ft:search($index, $text)/../..
3. Lookup against subindex file with reference to large index
Time: 2.9ms
Expression:
let $s :=
  ft:search($index, $text)/../..
return db:open-id($db, $s/id)/../..
My question is: why would the third expression be slightly faster (or at
least not slower) than the second one, if it involves additional
computation?
Thanks in advance,
Tim
-- 
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

[basex-talk] Performance of ft:search function