Hi Tamara,
Thanks a lot for sharing your interesting experiences with BaseX.
You mentioned that you are working with various custom indexes. Have you also considered adding an auxiliary index element to your main databases?
for $ead in db:open($db)//ead return insert node index { ft:tokenize($ean) } into $ead, db:optimize($db)
You could simplify then your query to something as follows:
for $db_id in tokenize($d, '|') for $text in ft:search($db_id, $terms, map{'mode':'all words','fuzzy':$f}) let $ean := $text/parent::ean update { delete node index } return <arg>{ $ean }</arg>
In addition, • the size of the full-text index can additionally be reduced by setting FTINCLUDE to this index element • If you are not interested in word order, you could remove duplicates via distinct-values(ft:tokenize($ean)) • As an alternative, the index strings could also be stored in a custom index database, or at least in a distinct path; this way, there would be no need to remove the 'index' element before returning the result.
Some time ago, we proposed to a user to modify FTINCLUDE and index elements instead of text nodes [1]. There was no further discussion on that approach, but I think it would be helpful in many use cases, including yours. Do you have an opinion about the suggestion we made?
Best, Christian
[1] https://www.mail-archive.com/basex-talk@mailman.uni-konstanz.de/msg12081.htm...