Re: [basex-talk] Results of some experiments for improving full-text search speeds

28 Feb 2022


      Hi Tamara,
Thanks a lot for sharing your interesting experiences with BaseX.
You mentioned that you are working with various custom indexes. Have
you also considered adding an auxiliary index element to your main
databases?
for $ead in db:open($db)//ead
return insert node index { ft:tokenize($ean) } into $ead,
db:optimize($db)
You could simplify then your query to something as follows:
for $db_id in tokenize($d, '|')
  for $text in ft:search($db_id, $terms, map{'mode':'all words','fuzzy':$f})
  let $ean := $text/parent::ean update { delete node index }
  return <arg>{ $ean }</arg>
In addition,
• the size of the full-text index can additionally be reduced by
setting FTINCLUDE to this index element
• If you are not interested in word order, you could remove duplicates
via distinct-values(ft:tokenize($ean))
• As an alternative, the index strings could also be stored in a
custom index database, or at least in a distinct path; this way, there
would be no need to remove the 'index' element before returning the
result.
Some time ago, we proposed to a user to modify FTINCLUDE and index
elements instead of text nodes [1]. There was no further discussion on
that approach, but I think it would be helpful in many use cases,
including yours. Do you have an opinion about the suggestion we made?
Best,
Christian
[1] https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg12081.htm...

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Results of some experiments for improving full-text search speeds