Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling: - merge steps: descendant::element(lmerFile:format) - rewrite for to let: for $db_1 in $dbs_0 - inline $dbs_0 - inline $db_1 - pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> (db:open-pre("00", 0), ...) - apply text index for "urn:123" - rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... - rewrite where clause(s) - simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::document-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.26 ms - Compiling: 55.69 ms - Evaluating: 60.66 ms - Printing: 0.37 ms - Total Time: 116.98 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling: - pre-evaluate expression list to xs:string sequence: ("00", "01") - merge steps: descendant::element(lmerFile:format) - inline $dbs_0 - rewrite where clause(s) Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.27 ms - Compiling: 0.54 ms - Evaluating: 1778.21 ms - Printing: 0.31 ms - Total Time: 1779.33 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün christian.gruen@gmail.com Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn bjoern.braunschweig@gwdg.de Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn bjoern.braunschweig@gwdg.de wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn