Hi,
I have several databases basically consisting of several entries like this one:
<entry> <lastChange>2013-04-30T08:20:00</lastChange> <user>marcel</user> <phraseme>Ad0190</phraseme> <annotated>no</annotated> <negation>?</negation> <genus>?</genus> <mediality>?</mediality> <meaning>?</meaning> <sentence>?</sentence> <question>?</question> <node>85750</node> <query>[text() contains text ('über' ftand 'den' ftand 'Hals' ftand 'kommen') distance at most 10 words]</query> <p>Gar gewiß hätte sie selbst der Teufel lebendig hingeführt, oder der Donner in die Asche gelegt, oder die Erde lebendig verschluckt, oder den wilden Thieren zum Raub worden, weil sie sich aber der Todten haben angenommen, so konnte sie kein zeitliches Unglück berühren. Nolite timere so fürchtet euch dann nicht, alle Liebhaber der armen Seelen im Fegfeuer, es kann euch so bald kein Unglück über den Hals kommen, die Todten helfen den Lebendigen.</p> <meta>?</meta> <semantics>?</semantics> <comment/> <merged>no</merged> <cite>?</cite> <sem>?</sem> <note/> </entry>
For the web-application I run with ExtJS, I extract some data like the values for <user> etc.
I also need the text stored in the <p> node, and I use the xquery stored in the <query> node as argument for ft:mark. So this part basically looks like this one:
for $i at $p in //entry let $q := $i/query let $annotated := string($i/annotated) let $t := if ($i/p) then $i/p else $i/l let $ft :=concat("ft:mark($binding", $q, ")") let $bm := map{'$binding' := $t} return <div>{if ($annotated = "no") then xquery:eval($ft, $bm) else $t}</div>
I get the query and the text and then apply the query to the text depending on the value for <annotation>. This works fine for most of the databases. However, for six or seven, I get a time-out, regardless of the size of the database. It works OK with DB consisting of several hundreds or only a dozen of such entries, but it does not work for the one I attach here as XML source (I created the DB via another xquery with whitespace chopping off and UPDINDEX ON, all indexes have been created. It is valid XML as far as I can see.)
I tested it with the GUI, I use BaseX 7.7beta from April 24, with VM=-Xmx512m. If I change the return expression to:
return <div>{if ($annotated = "no") then ($q, $t) else $t}</div>
I get a result for all entries in about 35ms. And this query plan:
Compiling: - simplifying descendant-or-self step(s) - converting descendant::*:entry to child steps - removing redundant $p as xs:integer cast. - atomic evaluation of ($annotated = "no") - inlining let $annotated := fn:string($i/annotated) - atomic evaluation of (fn:string($i/annotated) = "no") - removing variable $annotated - removing variable $ft - removing variable $bm - inlining let $q := $i/query - removing variable $q - inlining let $t := if($i/p) then $i/p else $i/l - removing variable $t Optimized Query: for $i at $p in document-node { "annotate-Ad0190.xml" }/*:root/*:entry return element div { (if((fn:string($i/annotated) = "no")) then ($i/query, if($i/p) then $i/p else $i/l) else if($i/p) then $i/p else $i/l) } Result: - Hit(s): 72 Items - Updated: 0 Items - Printed: 759 KB - Locking: local [annotate-Ad0190] Timing: - Parsing: 0.0 ms - Compiling: 1.14 ms - Evaluating: 34.44 ms - Printing: 7.8 ms - Total Time: 43.4 ms Query plan: <QueryPlan> <GFLWOR> <For> <Var name="$i" id="1"/> <at> <Var name="$p" id="0"/> </at> <IterPath> <DBNode name="annotate-Ad0190" pre="0"/> <IterStep axis="child" test="*:root"/> <IterStep axis="child" test="*:entry"/> </IterPath> </For> <CElem> <QNm value="div" type="xs:QName"/> <If> <CmpG op="="> <FNAcc name="string([item])"> <CachedPath> <VarRef> <Var name="$i" id="1"/> </VarRef> <IterStep axis="child" test="annotated"/> </CachedPath> </FNAcc> <Str value="no" type="xs:string"/> </CmpG> <List> <CachedPath> <VarRef> <Var name="$i" id="1"/> </VarRef> <IterStep axis="child" test="query"/> </CachedPath> <If> <CachedPath> <VarRef> <Var name="$i" id="1"/> </VarRef> <IterStep axis="child" test="p"/> </CachedPath> <CachedPath> <VarRef> <Var name="$i" id="1"/> </VarRef> <IterStep axis="child" test="p"/> </CachedPath> <CachedPath> <VarRef> <Var name="$i" id="1"/> </VarRef> <IterStep axis="child" test="l"/> </CachedPath> </If> </List> <If> <CachedPath> <VarRef> <Var name="$i" id="1"/> </VarRef> <IterStep axis="child" test="p"/> </CachedPath> <CachedPath> <VarRef> <Var name="$i" id="1"/> </VarRef> <IterStep axis="child" test="p"/> </CachedPath> <CachedPath> <VarRef> <Var name="$i" id="1"/> </VarRef> <IterStep axis="child" test="l"/> </CachedPath> </If> </If> </CElem> </GFLWOR> </QueryPlan>
For the original query, i.e., applying ft:mark, I only get a "Out of main memory error" after roughly 60000ms.
I don't see any suspicious going here. Do you have any idea what might be the reason for the time out? All other information I extract are returned in less than 100ms, the ft:mark seems to be the issue. Is there any way I can find out which of the entries is causing the time out?
Thanks in advance and best regards
Cerstin