Hi Matthias,
since I "definitely should" build a BaseX database from millions of TEI-XML files, I did so!
Glad to hear!
I modified the XQuery: ... gives results, but lasts orders of magnitude longer than for just one database:
If a query is run on a single database, this database will be opened at compile-time, and available indexes will be checked. If the full-text index exists, your query will be rewritten to take advantage of the index structure.
If multiple databases are accessed in an iteration, you can e.g. give the query optimizer a hint that all databases will have up-to-date index structures. This can be done with the “enforceindex” pragma [1]:
declare variable $b := 'Konstanz'; for $c in ('Korpus01', 'Korpus02') for $t in (# db:enforceindex #) { db:open($c)//*[./text() contains text {$b}] } return <p>{ ft:extract($t[./text() contains text {$b}]/text(), 'b', 155) }</p>
If you use the BaseX GUI, you can open the Info View and check the output. If it outputs “apply full-text index”, you’ll know that the index is utilized. In the Info View, you’ll also see the optimized query string. It will give you some hints which other optimizations were applied to your input query. If full-text queries get more complex, it’s sometimes more convenient to directly use ft:search, as this function allows you to specify variable arguments, e.g. for wildcard or fuzzy searches.
Hope this helps, Christian
[1] https://docs.basex.org/wiki/Indexes#Enforce_Rewritings [2] https://docs.basex.org/wiki/Full-Text_Module#ft:search