Hi,
I have a problem with slow queries. I'm not sure if it is due to the construction of the query or if there is something else going on.
I'm querying two databases: "collect" which is opened at the beginning and contains several thousands entries like these two:
<entry time="2012-05-22T11:27:09"> <node>40177618</node> <query>[text() contains text ('den' ftand 'Hals' ftand 'brachen') distance at most 10 words]</query> <person>britta</person> <phraseme>Ad0194</phraseme> <selected>yes</selected> </entry> <entry time="2012-05-22T11:27:09"> <node>40672561</node> <query>[text() contains text ('den' ftand 'Hals' ftand 'brachen') distance at most 10 words]</query> <person>britta</person> <phraseme>Ad0194</phraseme> <selected>no</selected> </entry>
The node-ids refer to a second database: "TG-DTA-GerManC-stemming-ws". The queries contain the xquery the node was found with.
For a particular phraseme, I display the original node using ft:mark to highlight the query terms. Additionally, I display some of the metadata for this node. The user will be able to "delete" some of the entries -- i.e., setting the value of "selected" from "yes" to "no". Therefore I render everything as a form element with a checkbox. The query is this one:
for $i at $p in //entry[phraseme[text() = "Ad0194"] and selected[text() = "yes"]] let $query := $i/query let $node := $i/node let $prefix := fn:in-scope-prefixes($i) let $title := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:fileDesc[1]//*:titleStmt[1]//*:title[1] let $author := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:sourceDesc[1]//*:bibl[1]//*:author[1] let $note := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:notesStmt//*:note let $expr := concat("ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', ", $node, ") ", $query, ")") let $time := data($i/@time) return <div> <hit count="{ $p}"> <p> <input type="checkbox" name="NODE" value="{$node}"/> <b class="hitno">{$p} ({ if($prefix = "dta") then "DTA" else "TG"})</b>Knoten: {$i/node} </p> {xquery:eval($expr)} </hit> <bib> <p class="bibl"> <b>{$time}</b><br/> <b>Bibliographie</b> { data($author)}: { data($title)} <br/> <b>Anmerkung</b>: { data ($note) }<br/> <b>Korpus</b>: { if($prefix = "dta") then "Deutsches Textarchiv" else "TextGrid Digitale Bibliothek"} </p> </bib> <p></p> </div>
However, for this particular phraseme ("Ad0194"), there will be 676 hits to be displayed (out of around 1000 stored) which takes around 15 seconds in the GUI. In the actual web application, the full site is shown after more than 20 seconds (there is some HTML to be displayed and another short xquery for displaying distinct queries for all entries in question), which is too slow, I think.
Is there anything I can do to speed up the query? Can I somehow rewrite it to make execution faster? I think opening the second DB four times costs a lot of time. Both DBs have an optimized index already.
Thanks in advance
Cerstin
for $i at $p in //entry[phraseme[text() = "Ad0194"] and selected[text() = "yes"]]
It’s often beneficial to avoid nested predicated. Does the following version give you better results?
//entry[phraseme/text() = "Ad0194" and selected/text() = "yes"]
Beside that, feel free to send us the query info (the output of the Info View), as it often indicates potential for additional optimizations.
Hi Christian,
Am 15.11.2012 um 20:00 schrieb Christian Grün:
for $i at $p in //entry[phraseme[text() = "Ad0194"] and selected[text() = "yes"]]
It’s often beneficial to avoid nested predicated. Does the following version give you better results?
//entry[phraseme/text() = "Ad0194" and selected/text() = "yes"]
It gives one or two seconds. But I will use this also for other queries, thanks!
Beside that, feel free to send us the query info (the output of the Info View), as it often indicates potential for additional optimizations.
OK, here is the query info. Most time is used for evaluation, also printing takes some time, but parsing and compiling looks pretty fast, I think.
Query: for $i at $p in //entry[phraseme[text() = "Ad0194"] and selected[text() = "yes"]] let $query := $i/query let $node := $i/node let $prefix := fn:in-scope-prefixes($i) let $title := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:fileDesc[1]//*:titleStmt[1]//*:title[1] let $author := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:sourceDesc[1]//*:bibl[1]//*:author[1] let $note := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:notesStmt//*:note let $expr := concat("ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', ", $node, ") ", $query, ")") let $time := data($i/@time) return <div> <hit count="{ $p}"> <p><input type="checkbox" name="NODE" value="{$node}"/><b class="hitno">{$p} ({ if($prefix = "dta") then "DTA" else "TG"})</b>Knoten: {$i/node}</p> {xquery:eval($expr)} </hit> <bib> <p class="bibl"><b>{$time}</b><br/><b>Bibliographie</b> { data($author)}: { data($title)} <br/><b>Anmerkung</b>: { data ($note) }<br/> <b>Korpus</b>: { if($prefix = "dta") then "Deutsches Textarchiv" else "TextGrid Digitale Bibliothek"}</p> </bib> <p></p></div>
Compiling: - rewriting And expression to predicate(s) - rewriting fn:boolean(phraseme[text() = "Ad0194"]) - rewriting fn:boolean(selected[text() = "yes"]) - simplifying descendant-or-self step(s) - simplifying descendant-or-self step(s)
Result: for $i at $p as xs:integer in document-node { "collect.xml" }/descendant::entry[phraseme[text() = "Ad0194"]][selected[text() = "yes"]] let $query := $i/query let $node := $i/node let $prefix := fn:in-scope-prefixes($i) let $title := db:open-id("TG-DTA-GerManC-stemming-ws", $node)/ancestor::*:TEI[1]/descendant-or-self::node()/*:fileDesc[1]/descendant-or-self::node()/*:titleStmt[1]/descendant-or-self::node()/*:title[1] let $author := db:open-id("TG-DTA-GerManC-stemming-ws", $node)/ancestor::*:TEI[1]/descendant-or-self::node()/*:sourceDesc[1]/descendant-or-self::node()/*:bibl[1]/descendant-or-self::node()/*:author[1] let $note := db:open-id("TG-DTA-GerManC-stemming-ws", $node)/ancestor::*:TEI[1]/descendant::*:notesStmt/descendant::*:note let $expr := fn:concat("ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', ", $node, ") ", $query, ")") let $time := fn:data($i/@time) return element div { element hit { attribute count { $p }, element p { element input { attribute type { "checkbox" }, attribute name { "NODE" }, attribute value { $node } }, element b { attribute class { "hitno" }, $p, " (", if($prefix = "dta") then "DTA" else "TG", ")" }, "Knoten: ", $i/node }, xquery:eval($expr) }, element bib { element p { attribute class { "bibl" }, element b { $time }, element br { () }, element b { "Bibliographie" }, fn:data($author), ": ", fn:data($title), element br { () }, element b { "Anmerkung" }, ": ", fn:data($note), element br { () }, element b { "Korpus" }, ": ", if($prefix = "dta") then "Deutsches Textarchiv" else "TextGrid Digitale Bibliothek" } }, element p { () } }
Timing: - Parsing: 14.63 ms - Compiling: 33.34 ms - Evaluating: 12216.87 ms - Printing: 449.52 ms - Total Time: 12714.37 ms
Result: - Hit(s): 676 Items - Updated: 0 Items - Printed: 2048 KB
Query plan: <QueryPlan> <FLWR> <For var="$i" pos="$p as xs:integer"> <IterPath> <DBNode name="collect-ws" pre="0"/> <IterStep axis="descendant" test="entry"> <AxisPath> <IterStep axis="child" test="phraseme"> <CmpG op="="> <AxisPath> <IterStep axis="child" test="text()"/> </AxisPath> <Str value="Ad0194" type="xs:string"/> </CmpG> </IterStep> </AxisPath> <AxisPath> <IterStep axis="child" test="selected"> <CmpG op="="> <AxisPath> <IterStep axis="child" test="text()"/> </AxisPath> <Str value="yes" type="xs:string"/> </CmpG> </IterStep> </AxisPath> </IterStep> </IterPath> </For> <Let var="$query"> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="child" test="query"/> </AxisPath> </Let> <Let var="$node"> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="child" test="node"/> </AxisPath> </Let> <Let var="$prefix"> <FNQName name="in-scope-prefixes(elem)"> <VarRef> <Var name="$i" id="0"/> </VarRef> </FNQName> </Let> <Let var="$title"> <AxisPath> <FNDb name="open-id(database,id)"> <Str value="TG-DTA-GerManC-stemming-ws" type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </FNDb> <IterPosStep axis="ancestor" test="*:TEI"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:fileDesc"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:titleStmt"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:title"> <Pos min="1" max="1"/> </IterPosStep> </AxisPath> </Let> <Let var="$author"> <AxisPath> <FNDb name="open-id(database,id)"> <Str value="TG-DTA-GerManC-stemming-ws" type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </FNDb> <IterPosStep axis="ancestor" test="*:TEI"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:sourceDesc"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:bibl"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:author"> <Pos min="1" max="1"/> </IterPosStep> </AxisPath> </Let> <Let var="$note"> <AxisPath> <FNDb name="open-id(database,id)"> <Str value="TG-DTA-GerManC-stemming-ws" type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </FNDb> <IterPosStep axis="ancestor" test="*:TEI"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant" test="*:notesStmt"/> <IterStep axis="descendant" test="*:note"/> </AxisPath> </Let> <Let var="$expr"> <FNStr name="concat(atom,atom[,...])"> <Str value="ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', " type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> <Str value=") " type="xs:string"/> <VarRef> <Var name="$query" id="2"/> </VarRef> <Str value=")" type="xs:string"/> </FNStr> </Let> <Let var="$time"> <FNGen name="data([item])"> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="attribute" test="time"/> </AxisPath> </FNGen> </Let> <Return> <CElem> <QNm value="div" type="xs:QName"/> <CElem> <QNm value="hit" type="xs:QName"/> <CAttr> <QNm value="count" type="xs:QName"/> <VarRef> <Var name="$p as xs:integer" id="1"/> </VarRef> </CAttr> <CElem> <QNm value="p" type="xs:QName"/> <CElem> <QNm value="input" type="xs:QName"/> <CAttr> <QNm value="type" type="xs:QName"/> <Str value="checkbox" type="xs:string"/> </CAttr> <CAttr> <QNm value="name" type="xs:QName"/> <Str value="NODE" type="xs:string"/> </CAttr> <CAttr> <QNm value="value" type="xs:QName"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </CAttr> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <CAttr> <QNm value="class" type="xs:QName"/> <Str value="hitno" type="xs:string"/> </CAttr> <VarRef> <Var name="$p as xs:integer" id="1"/> </VarRef> <Str value=" (" type="xs:string"/> <If> <CmpG op="="> <VarRef> <Var name="$prefix" id="4"/> </VarRef> <Str value="dta" type="xs:string"/> </CmpG> <Str value="DTA" type="xs:string"/> <Str value="TG" type="xs:string"/> </If> <Str value=")" type="xs:string"/> </CElem> <Str value="Knoten: " type="xs:string"/> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="child" test="node"/> </AxisPath> </CElem> <FNXQuery name="eval(string[,bindings])"> <VarRef> <Var name="$expr" id="8"/> </VarRef> </FNXQuery> </CElem> <CElem> <QNm value="bib" type="xs:QName"/> <CElem> <QNm value="p" type="xs:QName"/> <CAttr> <QNm value="class" type="xs:QName"/> <Str value="bibl" type="xs:string"/> </CAttr> <CElem> <QNm value="b" type="xs:QName"/> <VarRef> <Var name="$time" id="9"/> </VarRef> </CElem> <CElem> <QNm value="br" type="xs:QName"/> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <Str value="Bibliographie" type="xs:string"/> </CElem> <FNGen name="data([item])"> <VarRef> <Var name="$author" id="6"/> </VarRef> </FNGen> <Str value=": " type="xs:string"/> <FNGen name="data([item])"> <VarRef> <Var name="$title" id="5"/> </VarRef> </FNGen> <CElem> <QNm value="br" type="xs:QName"/> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <Str value="Anmerkung" type="xs:string"/> </CElem> <Str value=": " type="xs:string"/> <FNGen name="data([item])"> <VarRef> <Var name="$note" id="7"/> </VarRef> </FNGen> <CElem> <QNm value="br" type="xs:QName"/> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <Str value="Korpus" type="xs:string"/> </CElem> <Str value=": " type="xs:string"/> <If> <CmpG op="="> <VarRef> <Var name="$prefix" id="4"/> </VarRef> <Str value="dta" type="xs:string"/> </CmpG> <Str value="Deutsches Textarchiv" type="xs:string"/> <Str value="TextGrid Digitale Bibliothek" type="xs:string"/> </If> </CElem> </CElem> <CElem> <QNm value="p" type="xs:QName"/> </CElem> </CElem> </Return> </FLWR> </QueryPlan>
-- Dr. phil. Cerstin Mahlow
Universität Basel Deutsches Seminar Nadelberg 4 4051 Basel Schweiz
Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mahlow@unibas.chmailto:cerstin.mahlow@unibas.ch Web: http://www.oldphras.nethttp://www.oldphras.net/
Hi Cerstin,
OK, here is the query info. Most time is used for evaluation, also printing takes some time, but parsing and compiling looks pretty fast, I think.
it looks as the query plan is still based on the nested predicates. Have you checked if the simplified form leads to the usage of index structures (provided that you have up-to-date index structures at this stage)?
One more thing I noticed: ".//X[1]" is often expensive. Its is the same as "./descendant-or-self::node()/child::X[1]", which yields a large number of intermediary results. If you don't need the first "X" child element from all descendant-or-self nods, but rather the first descendant "X" element, I would suggest to rewrite the query to one of the two versions:
descendant::X[1] ..or (.//X)[1]
This cannot be done by the optimizer itself, because ".//X[1]" and "./descendant::X[1]" are not equivalent.
Christian
Query: for $i at $p in //entry[phraseme[text() = "Ad0194"] and selected[text() = "yes"]] let $query := $i/query let $node := $i/node let $prefix := fn:in-scope-prefixes($i) let $title := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:fileDesc[1]//*:titleStmt[1]//*:title[1] let $author := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:sourceDesc[1]//*:bibl[1]//*:author[1] let $note := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:notesStmt//*:note let $expr := concat("ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', ", $node, ") ", $query, ")") let $time := data($i/@time) return <div> <hit count="{ $p}">
<p><input type="checkbox" name="NODE" value="{$node}"/><b class="hitno">{$p} ({ if($prefix = "dta") then "DTA" else "TG"})</b>Knoten: {$i/node}</p> {xquery:eval($expr)} </hit> <bib> <p class="bibl"><b>{$time}</b><br/><b>Bibliographie</b> { data($author)}: { data($title)} <br/><b>Anmerkung</b>: { data ($note) }<br/> <b>Korpus</b>: { if($prefix = "dta") then "Deutsches Textarchiv" else "TextGrid Digitale Bibliothek"}</p> </bib> <p></p></div>
Compiling:
- rewriting And expression to predicate(s)
- rewriting fn:boolean(phraseme[text() = "Ad0194"])
- rewriting fn:boolean(selected[text() = "yes"])
- simplifying descendant-or-self step(s)
- simplifying descendant-or-self step(s)
Result: for $i at $p as xs:integer in document-node { "collect.xml" }/descendant::entry[phraseme[text() = "Ad0194"]][selected[text() = "yes"]] let $query := $i/query let $node := $i/node let $prefix := fn:in-scope-prefixes($i) let $title := db:open-id("TG-DTA-GerManC-stemming-ws", $node)/ancestor::*:TEI[1]/descendant-or-self::node()/*:fileDesc[1]/descendant-or-self::node()/*:titleStmt[1]/descendant-or-self::node()/*:title[1] let $author := db:open-id("TG-DTA-GerManC-stemming-ws", $node)/ancestor::*:TEI[1]/descendant-or-self::node()/*:sourceDesc[1]/descendant-or-self::node()/*:bibl[1]/descendant-or-self::node()/*:author[1] let $note := db:open-id("TG-DTA-GerManC-stemming-ws", $node)/ancestor::*:TEI[1]/descendant::*:notesStmt/descendant::*:note let $expr := fn:concat("ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', ", $node, ") ", $query, ")") let $time := fn:data($i/@time) return element div { element hit { attribute count { $p }, element p { element input { attribute type { "checkbox" }, attribute name { "NODE" }, attribute value { $node } }, element b { attribute class { "hitno" }, $p, " (", if($prefix = "dta") then "DTA" else "TG", ")" }, "Knoten: ", $i/node }, xquery:eval($expr) }, element bib { element p { attribute class { "bibl" }, element b { $time }, element br { () }, element b { "Bibliographie" }, fn:data($author), ": ", fn:data($title), element br { () }, element b { "Anmerkung" }, ": ", fn:data($note), element br { () }, element b { "Korpus" }, ": ", if($prefix = "dta") then "Deutsches Textarchiv" else "TextGrid Digitale Bibliothek" } }, element p { () } }
Timing:
- Parsing: 14.63 ms
- Compiling: 33.34 ms
- Evaluating: 12216.87 ms
- Printing: 449.52 ms
- Total Time: 12714.37 ms
Result:
- Hit(s): 676 Items
- Updated: 0 Items
- Printed: 2048 KB
Query plan:
<QueryPlan> <FLWR> <For var="$i" pos="$p as xs:integer"> <IterPath> <DBNode name="collect-ws" pre="0"/> <IterStep axis="descendant" test="entry"> <AxisPath> <IterStep axis="child" test="phraseme"> <CmpG op="="> <AxisPath> <IterStep axis="child" test="text()"/> </AxisPath> <Str value="Ad0194" type="xs:string"/> </CmpG> </IterStep> </AxisPath> <AxisPath> <IterStep axis="child" test="selected"> <CmpG op="="> <AxisPath> <IterStep axis="child" test="text()"/> </AxisPath> <Str value="yes" type="xs:string"/> </CmpG> </IterStep> </AxisPath> </IterStep> </IterPath> </For> <Let var="$query"> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="child" test="query"/> </AxisPath> </Let> <Let var="$node"> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="child" test="node"/> </AxisPath> </Let> <Let var="$prefix"> <FNQName name="in-scope-prefixes(elem)"> <VarRef> <Var name="$i" id="0"/> </VarRef> </FNQName> </Let> <Let var="$title"> <AxisPath> <FNDb name="open-id(database,id)"> <Str value="TG-DTA-GerManC-stemming-ws" type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </FNDb> <IterPosStep axis="ancestor" test="*:TEI"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:fileDesc"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:titleStmt"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:title"> <Pos min="1" max="1"/> </IterPosStep> </AxisPath> </Let> <Let var="$author"> <AxisPath> <FNDb name="open-id(database,id)"> <Str value="TG-DTA-GerManC-stemming-ws" type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </FNDb> <IterPosStep axis="ancestor" test="*:TEI"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:sourceDesc"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:bibl"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:author"> <Pos min="1" max="1"/> </IterPosStep> </AxisPath> </Let> <Let var="$note"> <AxisPath> <FNDb name="open-id(database,id)"> <Str value="TG-DTA-GerManC-stemming-ws" type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </FNDb> <IterPosStep axis="ancestor" test="*:TEI"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant" test="*:notesStmt"/> <IterStep axis="descendant" test="*:note"/> </AxisPath> </Let> <Let var="$expr"> <FNStr name="concat(atom,atom[,...])"> <Str value="ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', " type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> <Str value=") " type="xs:string"/> <VarRef> <Var name="$query" id="2"/> </VarRef> <Str value=")" type="xs:string"/> </FNStr> </Let> <Let var="$time"> <FNGen name="data([item])"> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="attribute" test="time"/> </AxisPath> </FNGen> </Let> <Return> <CElem> <QNm value="div" type="xs:QName"/> <CElem> <QNm value="hit" type="xs:QName"/> <CAttr> <QNm value="count" type="xs:QName"/> <VarRef> <Var name="$p as xs:integer" id="1"/> </VarRef> </CAttr> <CElem> <QNm value="p" type="xs:QName"/> <CElem> <QNm value="input" type="xs:QName"/> <CAttr> <QNm value="type" type="xs:QName"/> <Str value="checkbox" type="xs:string"/> </CAttr> <CAttr> <QNm value="name" type="xs:QName"/> <Str value="NODE" type="xs:string"/> </CAttr> <CAttr> <QNm value="value" type="xs:QName"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </CAttr> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <CAttr> <QNm value="class" type="xs:QName"/> <Str value="hitno" type="xs:string"/> </CAttr> <VarRef> <Var name="$p as xs:integer" id="1"/> </VarRef> <Str value=" (" type="xs:string"/> <If> <CmpG op="="> <VarRef> <Var name="$prefix" id="4"/> </VarRef> <Str value="dta" type="xs:string"/> </CmpG> <Str value="DTA" type="xs:string"/> <Str value="TG" type="xs:string"/> </If> <Str value=")" type="xs:string"/> </CElem> <Str value="Knoten: " type="xs:string"/> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="child" test="node"/> </AxisPath> </CElem> <FNXQuery name="eval(string[,bindings])"> <VarRef> <Var name="$expr" id="8"/> </VarRef> </FNXQuery> </CElem> <CElem> <QNm value="bib" type="xs:QName"/> <CElem> <QNm value="p" type="xs:QName"/> <CAttr> <QNm value="class" type="xs:QName"/> <Str value="bibl" type="xs:string"/> </CAttr> <CElem> <QNm value="b" type="xs:QName"/> <VarRef> <Var name="$time" id="9"/> </VarRef> </CElem> <CElem> <QNm value="br" type="xs:QName"/> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <Str value="Bibliographie" type="xs:string"/> </CElem> <FNGen name="data([item])"> <VarRef> <Var name="$author" id="6"/> </VarRef> </FNGen> <Str value=": " type="xs:string"/> <FNGen name="data([item])"> <VarRef> <Var name="$title" id="5"/> </VarRef> </FNGen> <CElem> <QNm value="br" type="xs:QName"/> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <Str value="Anmerkung" type="xs:string"/> </CElem> <Str value=": " type="xs:string"/> <FNGen name="data([item])"> <VarRef> <Var name="$note" id="7"/> </VarRef> </FNGen> <CElem> <QNm value="br" type="xs:QName"/> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <Str value="Korpus" type="xs:string"/> </CElem> <Str value=": " type="xs:string"/> <If> <CmpG op="="> <VarRef> <Var name="$prefix" id="4"/> </VarRef> <Str value="dta" type="xs:string"/> </CmpG> <Str value="Deutsches Textarchiv" type="xs:string"/> <Str value="TextGrid Digitale Bibliothek" type="xs:string"/> </If> </CElem> </CElem> <CElem> <QNm value="p" type="xs:QName"/> </CElem> </CElem> </Return> </FLWR> </QueryPlan>
-- Dr. phil. Cerstin Mahlow
Universität Basel Deutsches Seminar Nadelberg 4 4051 Basel Schweiz
Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mahlow@unibas.ch Web: http://www.oldphras.net
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christian,
Am 18.11.2012 um 17:07 schrieb Christian Grün:
it looks as the query plan is still based on the nested predicates. Have you checked if the simplified form leads to the usage of index structures (provided that you have up-to-date index structures at this stage)?
I think, it does.
One more thing I noticed: ".//X[1]" is often expensive. Its is the same as "./descendant-or-self::node()/child::X[1]", which yields a large number of intermediary results. If you don't need the first "X" child element from all descendant-or-self nods, but rather the first descendant "X" element, I would suggest to rewrite the query to one of the two versions:
descendant::X[1] ..or (.//X)[1]
This cannot be done by the optimizer itself, because ".//X[1]" and "./descendant::X[1]" are not equivalent.
I' m not quite sure if I rewrote it correctly.
The respective lines are:
let $title := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:fileDesc[1]//*:titleStmt[1]//*:title[1] let $author := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:sourceDesc[1]//*:bibl[1]//*:author[1] let $note := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:notesStmt//*:note
So for the title I go up to the first "TEI" element (/ancestor::*:TEI[1]) and from there I travel down until I detect the first "fileDesc" element and in this the first "titleStmt" element and then I take the first "title" element.
When I rewrite this into:
let $title := (db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:fileDesc)[1]//*:titleStmt[1]//*:title[1]
The results still look OK and it is faster.
But trying this:
let $title := (db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:fileDesc[1]//*:titleStmt)[1]//*:title[1]
the respective information will not be retrieved.
I don't know exactly where to put the brackets.
However, I also tried this:
let $tei := (db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI)[1] let $title := ($tei//*:fileDesc)[1]//*:titleStmt[1]//*:title[1] let $author := ($tei//*:sourceDesc)[1]//*:bibl[1]//*:author[1] let $note := $tei//*:notesStmt//*:note
I put the whole TEI node into a variable and then use this one to retrieve the needed information. I'm not sure, if this in general is faster than opening the second DB three times. The first and the second solution seem to be equivalent concerning general performance.
The query infor for the first one and the second one is attached below:
Query: for $i at $p in //entry[phraseme/text() = "Ad0194" and selected/text() = "yes"] let $query := $i/query let $node := $i/node let $prefix := fn:in-scope-prefixes($i) let $title := (db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:fileDesc[1]//*:titleStmt)[1]//*:title[1] let $author := (db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:sourceDesc)[1]//*:bibl[1]//*:author[1] let $note :=( db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI)[1]//*:notesStmt//*:note let $expr := concat("ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', ", $node, ") ", $query, ")") let $time := data($i/@time) return <div> <hit count="{ $p}"> <p><input type="checkbox" name="NODE" value="{$node}"/><b class="hitno">{$p} ({ if($prefix = "dta") then "DTA" else "TG"})</b>Knoten: {$i/node}</p> {xquery:eval($expr)} </hit> <bib> <p class="bibl"><b>{$time}</b><br/><b>Bibliographie</b> { data($author)}: { data($title)} <br/><b>Anmerkung</b>: { data ($note) }<br/> <b>Korpus</b>: { if($prefix = "dta") then "Deutsches Textarchiv" else "TextGrid Digitale Bibliothek"}</p> </bib> <p></p></div>
Compiling: - rewriting And expression to predicate(s) - rewriting fn:boolean(phraseme/text() = "Ad0194") - rewriting fn:boolean(selected/text() = "yes") - simplifying descendant-or-self step(s) - applying text index - simplifying descendant-or-self step(s) - simplifying descendant-or-self step(s) - simplifying descendant-or-self step(s)
Result: for $i at $p as xs:integer in db:text("collect-ws", "Ad0194")/parent::phraseme/parent::entry[selected/text() = "yes"] let $query := $i/query let $node := $i/node let $prefix := fn:in-scope-prefixes($i) let $title := (db:open-id("TG-DTA-GerManC-stemming-ws", $node)/ancestor::*:TEI[1]/descendant-or-self::node()/*:fileDesc[1]/descendant::*:titleStmt)[1]/descendant-or-self::node()/*:title[1] let $author := (db:open-id("TG-DTA-GerManC-stemming-ws", $node)/ancestor::*:TEI[1]/descendant::*:sourceDesc)[1]/descendant-or-self::node()/*:bibl[1]/descendant-or-self::node()/*:author[1] let $note := (db:open-id("TG-DTA-GerManC-stemming-ws", $node)/ancestor::*:TEI)[1]/descendant::*:notesStmt/descendant::*:note let $expr := fn:concat("ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', ", $node, ") ", $query, ")") let $time := fn:data($i/@time) return element div { element hit { attribute count { $p }, element p { element input { attribute type { "checkbox" }, attribute name { "NODE" }, attribute value { $node } }, element b { attribute class { "hitno" }, $p, " (", if($prefix = "dta") then "DTA" else "TG", ")" }, "Knoten: ", $i/node }, xquery:eval($expr) }, element bib { element p { attribute class { "bibl" }, element b { $time }, element br { () }, element b { "Bibliographie" }, fn:data($author), ": ", fn:data($title), element br { () }, element b { "Anmerkung" }, ": ", fn:data($note), element br { () }, element b { "Korpus" }, ": ", if($prefix = "dta") then "Deutsches Textarchiv" else "TextGrid Digitale Bibliothek" } }, element p { () } }
Timing: - Parsing: 1.89 ms - Compiling: 5.5 ms - Evaluating: 5697.76 ms - Printing: 38.35 ms - Total Time: 5743.51 ms
Result: - Hit(s): 676 Items - Updated: 0 Items - Printed: 2048 KB
Query plan: <QueryPlan> <FLWR> <For var="$i" pos="$p as xs:integer"> <AxisPath> <ValueAccess data="collect-ws" type="TEXT"> <Str value="Ad0194" type="xs:string"/> </ValueAccess> <IterStep axis="parent" test="phraseme"/> <IterStep axis="parent" test="entry"> <CmpG op="="> <AxisPath> <IterStep axis="child" test="selected"/> <IterStep axis="child" test="text()"/> </AxisPath> <Str value="yes" type="xs:string"/> </CmpG> </IterStep> </AxisPath> </For> <Let var="$query"> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="child" test="query"/> </AxisPath> </Let> <Let var="$node"> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="child" test="node"/> </AxisPath> </Let> <Let var="$prefix"> <FNQName name="in-scope-prefixes(elem)"> <VarRef> <Var name="$i" id="0"/> </VarRef> </FNQName> </Let> <Let var="$title"> <AxisPath> <IterPosFilter> <AxisPath> <FNDb name="open-id(database,id)"> <Str value="TG-DTA-GerManC-stemming-ws" type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </FNDb> <IterPosStep axis="ancestor" test="*:TEI"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:fileDesc"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant" test="*:titleStmt"/> </AxisPath> <Pos min="1" max="1"/> </IterPosFilter> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:title"> <Pos min="1" max="1"/> </IterPosStep> </AxisPath> </Let> <Let var="$author"> <AxisPath> <IterPosFilter> <AxisPath> <FNDb name="open-id(database,id)"> <Str value="TG-DTA-GerManC-stemming-ws" type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </FNDb> <IterPosStep axis="ancestor" test="*:TEI"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant" test="*:sourceDesc"/> </AxisPath> <Pos min="1" max="1"/> </IterPosFilter> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:bibl"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:author"> <Pos min="1" max="1"/> </IterPosStep> </AxisPath> </Let> <Let var="$note"> <AxisPath> <IterPosFilter> <AxisPath> <FNDb name="open-id(database,id)"> <Str value="TG-DTA-GerManC-stemming-ws" type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </FNDb> <IterStep axis="ancestor" test="*:TEI"/> </AxisPath> <Pos min="1" max="1"/> </IterPosFilter> <IterStep axis="descendant" test="*:notesStmt"/> <IterStep axis="descendant" test="*:note"/> </AxisPath> </Let> <Let var="$expr"> <FNStr name="concat(atom,atom[,...])"> <Str value="ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', " type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> <Str value=") " type="xs:string"/> <VarRef> <Var name="$query" id="2"/> </VarRef> <Str value=")" type="xs:string"/> </FNStr> </Let> <Let var="$time"> <FNGen name="data([item])"> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="attribute" test="time"/> </AxisPath> </FNGen> </Let> <Return> <CElem> <QNm value="div" type="xs:QName"/> <CElem> <QNm value="hit" type="xs:QName"/> <CAttr> <QNm value="count" type="xs:QName"/> <VarRef> <Var name="$p as xs:integer" id="1"/> </VarRef> </CAttr> <CElem> <QNm value="p" type="xs:QName"/> <CElem> <QNm value="input" type="xs:QName"/> <CAttr> <QNm value="type" type="xs:QName"/> <Str value="checkbox" type="xs:string"/> </CAttr> <CAttr> <QNm value="name" type="xs:QName"/> <Str value="NODE" type="xs:string"/> </CAttr> <CAttr> <QNm value="value" type="xs:QName"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </CAttr> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <CAttr> <QNm value="class" type="xs:QName"/> <Str value="hitno" type="xs:string"/> </CAttr> <VarRef> <Var name="$p as xs:integer" id="1"/> </VarRef> <Str value=" (" type="xs:string"/> <If> <CmpG op="="> <VarRef> <Var name="$prefix" id="4"/> </VarRef> <Str value="dta" type="xs:string"/> </CmpG> <Str value="DTA" type="xs:string"/> <Str value="TG" type="xs:string"/> </If> <Str value=")" type="xs:string"/> </CElem> <Str value="Knoten: " type="xs:string"/> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="child" test="node"/> </AxisPath> </CElem> <FNXQuery name="eval(string[,bindings])"> <VarRef> <Var name="$expr" id="8"/> </VarRef> </FNXQuery> </CElem> <CElem> <QNm value="bib" type="xs:QName"/> <CElem> <QNm value="p" type="xs:QName"/> <CAttr> <QNm value="class" type="xs:QName"/> <Str value="bibl" type="xs:string"/> </CAttr> <CElem> <QNm value="b" type="xs:QName"/> <VarRef> <Var name="$time" id="9"/> </VarRef> </CElem> <CElem> <QNm value="br" type="xs:QName"/> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <Str value="Bibliographie" type="xs:string"/> </CElem> <FNGen name="data([item])"> <VarRef> <Var name="$author" id="6"/> </VarRef> </FNGen> <Str value=": " type="xs:string"/> <FNGen name="data([item])"> <VarRef> <Var name="$title" id="5"/> </VarRef> </FNGen> <CElem> <QNm value="br" type="xs:QName"/> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <Str value="Anmerkung" type="xs:string"/> </CElem> <Str value=": " type="xs:string"/> <FNGen name="data([item])"> <VarRef> <Var name="$note" id="7"/> </VarRef> </FNGen> <CElem> <QNm value="br" type="xs:QName"/> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <Str value="Korpus" type="xs:string"/> </CElem> <Str value=": " type="xs:string"/> <If> <CmpG op="="> <VarRef> <Var name="$prefix" id="4"/> </VarRef> <Str value="dta" type="xs:string"/> </CmpG> <Str value="Deutsches Textarchiv" type="xs:string"/> <Str value="TextGrid Digitale Bibliothek" type="xs:string"/> </If> </CElem> </CElem> <CElem> <QNm value="p" type="xs:QName"/> </CElem> </CElem> </Return> </FLWR> </QueryPlan>
#############################################################
Query: for $i at $p in //entry[phraseme/text() = "Ad0194" and selected/text() = "yes"] let $query := $i/query let $node := $i/node let $prefix := fn:in-scope-prefixes($i) let $tei := (db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI)[1] let $title := ($tei//*:fileDesc)[1]//*:titleStmt[1]//*:title[1] let $author := ($tei//*:sourceDesc)[1]//*:bibl[1]//*:author[1] let $note := $tei//*:notesStmt//*:note let $expr := concat("ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', ", $node, ") ", $query, ")") let $time := data($i/@time) return <div> <hit count="{ $p}"> <p><input type="checkbox" name="NODE" value="{$node}"/><b class="hitno">{$p} ({ if($prefix = "dta") then "DTA" else "TG"})</b>Knoten: {$i/node}</p> {xquery:eval($expr)} </hit> <bib> <p class="bibl"><b>{$time}</b><br/><b>Bibliographie</b> { data($author)}: { data($title)} <br/><b>Anmerkung</b>: { data ($note) }<br/> <b>Korpus</b>: { if($prefix = "dta") then "Deutsches Textarchiv" else "TextGrid Digitale Bibliothek"}</p> </bib> <p></p></div>
Compiling: - rewriting And expression to predicate(s) - rewriting fn:boolean(phraseme/text() = "Ad0194") - rewriting fn:boolean(selected/text() = "yes") - simplifying descendant-or-self step(s) - applying text index - simplifying descendant-or-self step(s) - simplifying descendant-or-self step(s) - simplifying descendant-or-self step(s)
Result: for $i at $p as xs:integer in db:text("collect-ws", "Ad0194")/parent::phraseme/parent::entry[selected/text() = "yes"] let $query := $i/query let $node := $i/node let $prefix := fn:in-scope-prefixes($i) let $tei := (db:open-id("TG-DTA-GerManC-stemming-ws", $node)/ancestor::*:TEI)[1] let $title := ($tei/descendant::*:fileDesc)[1]/descendant-or-self::node()/*:titleStmt[1]/descendant-or-self::node()/*:title[1] let $author := ($tei/descendant::*:sourceDesc)[1]/descendant-or-self::node()/*:bibl[1]/descendant-or-self::node()/*:author[1] let $note := $tei/descendant::*:notesStmt/descendant::*:note let $expr := fn:concat("ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', ", $node, ") ", $query, ")") let $time := fn:data($i/@time) return element div { element hit { attribute count { $p }, element p { element input { attribute type { "checkbox" }, attribute name { "NODE" }, attribute value { $node } }, element b { attribute class { "hitno" }, $p, " (", if($prefix = "dta") then "DTA" else "TG", ")" }, "Knoten: ", $i/node }, xquery:eval($expr) }, element bib { element p { attribute class { "bibl" }, element b { $time }, element br { () }, element b { "Bibliographie" }, fn:data($author), ": ", fn:data($title), element br { () }, element b { "Anmerkung" }, ": ", fn:data($note), element br { () }, element b { "Korpus" }, ": ", if($prefix = "dta") then "Deutsches Textarchiv" else "TextGrid Digitale Bibliothek" } }, element p { () } }
Timing: - Parsing: 3.01 ms - Compiling: 3.44 ms - Evaluating: 5180.53 ms - Printing: 59.07 ms - Total Time: 5246.06 ms
Result: - Hit(s): 676 Items - Updated: 0 Items - Printed: 2048 KB
Query plan: <QueryPlan> <FLWR> <For var="$i" pos="$p as xs:integer"> <AxisPath> <ValueAccess data="collect-ws" type="TEXT"> <Str value="Ad0194" type="xs:string"/> </ValueAccess> <IterStep axis="parent" test="phraseme"/> <IterStep axis="parent" test="entry"> <CmpG op="="> <AxisPath> <IterStep axis="child" test="selected"/> <IterStep axis="child" test="text()"/> </AxisPath> <Str value="yes" type="xs:string"/> </CmpG> </IterStep> </AxisPath> </For> <Let var="$query"> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="child" test="query"/> </AxisPath> </Let> <Let var="$node"> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="child" test="node"/> </AxisPath> </Let> <Let var="$prefix"> <FNQName name="in-scope-prefixes(elem)"> <VarRef> <Var name="$i" id="0"/> </VarRef> </FNQName> </Let> <Let var="$tei"> <IterPosFilter> <AxisPath> <FNDb name="open-id(database,id)"> <Str value="TG-DTA-GerManC-stemming-ws" type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </FNDb> <IterStep axis="ancestor" test="*:TEI"/> </AxisPath> <Pos min="1" max="1"/> </IterPosFilter> </Let> <Let var="$title"> <AxisPath> <IterPosFilter> <AxisPath> <VarRef> <Var name="$tei" id="5"/> </VarRef> <IterStep axis="descendant" test="*:fileDesc"/> </AxisPath> <Pos min="1" max="1"/> </IterPosFilter> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:titleStmt"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:title"> <Pos min="1" max="1"/> </IterPosStep> </AxisPath> </Let> <Let var="$author"> <AxisPath> <IterPosFilter> <AxisPath> <VarRef> <Var name="$tei" id="5"/> </VarRef> <IterStep axis="descendant" test="*:sourceDesc"/> </AxisPath> <Pos min="1" max="1"/> </IterPosFilter> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:bibl"> <Pos min="1" max="1"/> </IterPosStep> <IterStep axis="descendant-or-self" test="node()"/> <IterPosStep axis="child" test="*:author"> <Pos min="1" max="1"/> </IterPosStep> </AxisPath> </Let> <Let var="$note"> <AxisPath> <VarRef> <Var name="$tei" id="5"/> </VarRef> <IterStep axis="descendant" test="*:notesStmt"/> <IterStep axis="descendant" test="*:note"/> </AxisPath> </Let> <Let var="$expr"> <FNStr name="concat(atom,atom[,...])"> <Str value="ft:mark(db:open-id('TG-DTA-GerManC-stemming-ws', " type="xs:string"/> <VarRef> <Var name="$node" id="3"/> </VarRef> <Str value=") " type="xs:string"/> <VarRef> <Var name="$query" id="2"/> </VarRef> <Str value=")" type="xs:string"/> </FNStr> </Let> <Let var="$time"> <FNGen name="data([item])"> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="attribute" test="time"/> </AxisPath> </FNGen> </Let> <Return> <CElem> <QNm value="div" type="xs:QName"/> <CElem> <QNm value="hit" type="xs:QName"/> <CAttr> <QNm value="count" type="xs:QName"/> <VarRef> <Var name="$p as xs:integer" id="1"/> </VarRef> </CAttr> <CElem> <QNm value="p" type="xs:QName"/> <CElem> <QNm value="input" type="xs:QName"/> <CAttr> <QNm value="type" type="xs:QName"/> <Str value="checkbox" type="xs:string"/> </CAttr> <CAttr> <QNm value="name" type="xs:QName"/> <Str value="NODE" type="xs:string"/> </CAttr> <CAttr> <QNm value="value" type="xs:QName"/> <VarRef> <Var name="$node" id="3"/> </VarRef> </CAttr> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <CAttr> <QNm value="class" type="xs:QName"/> <Str value="hitno" type="xs:string"/> </CAttr> <VarRef> <Var name="$p as xs:integer" id="1"/> </VarRef> <Str value=" (" type="xs:string"/> <If> <CmpG op="="> <VarRef> <Var name="$prefix" id="4"/> </VarRef> <Str value="dta" type="xs:string"/> </CmpG> <Str value="DTA" type="xs:string"/> <Str value="TG" type="xs:string"/> </If> <Str value=")" type="xs:string"/> </CElem> <Str value="Knoten: " type="xs:string"/> <AxisPath> <VarRef> <Var name="$i" id="0"/> </VarRef> <IterStep axis="child" test="node"/> </AxisPath> </CElem> <FNXQuery name="eval(string[,bindings])"> <VarRef> <Var name="$expr" id="9"/> </VarRef> </FNXQuery> </CElem> <CElem> <QNm value="bib" type="xs:QName"/> <CElem> <QNm value="p" type="xs:QName"/> <CAttr> <QNm value="class" type="xs:QName"/> <Str value="bibl" type="xs:string"/> </CAttr> <CElem> <QNm value="b" type="xs:QName"/> <VarRef> <Var name="$time" id="10"/> </VarRef> </CElem> <CElem> <QNm value="br" type="xs:QName"/> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <Str value="Bibliographie" type="xs:string"/> </CElem> <FNGen name="data([item])"> <VarRef> <Var name="$author" id="7"/> </VarRef> </FNGen> <Str value=": " type="xs:string"/> <FNGen name="data([item])"> <VarRef> <Var name="$title" id="6"/> </VarRef> </FNGen> <CElem> <QNm value="br" type="xs:QName"/> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <Str value="Anmerkung" type="xs:string"/> </CElem> <Str value=": " type="xs:string"/> <FNGen name="data([item])"> <VarRef> <Var name="$note" id="8"/> </VarRef> </FNGen> <CElem> <QNm value="br" type="xs:QName"/> </CElem> <CElem> <QNm value="b" type="xs:QName"/> <Str value="Korpus" type="xs:string"/> </CElem> <Str value=": " type="xs:string"/> <If> <CmpG op="="> <VarRef> <Var name="$prefix" id="4"/> </VarRef> <Str value="dta" type="xs:string"/> </CmpG> <Str value="Deutsches Textarchiv" type="xs:string"/> <Str value="TextGrid Digitale Bibliothek" type="xs:string"/> </If> </CElem> </CElem> <CElem> <QNm value="p" type="xs:QName"/> </CElem> </CElem> </Return> </FLWR> </QueryPlan>
let $title := (db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:fileDesc)[1]//*:titleStmt[1]//*:title[1]
Do you think that the following query would return the expected result?
db:open-id('TG-DTA-GerManC-stemming-ws', $node)/ ancestor::*:TEI[1]/ descendant::*:fileDesc[1]/ descendant::*:titleStmt[1]/ descendant::*:title[1]
If yes, it may be the fastest version (can’t promise, though). If no, could you try to specify what the given query is supposed to return in natural language?
By the way, if you know that an element will only occur once, it may even be faster to get rid of the position predicate "[1]". Still, due to the many variants a location path can look like, and the variety of the input to be processed, I can’t give any guarantee for that.
Christian
Am 19.11.2012 um 23:00 schrieb Christian Grün:
let $title := (db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:fileDesc)[1]//*:titleStmt[1]//*:title[1]
Do you think that the following query would return the expected result?
db:open-id('TG-DTA-GerManC-stemming-ws', $node)/ ancestor::*:TEI[1]/ descendant::*:fileDesc[1]/ descendant::*:titleStmt[1]/ descendant::*:title[1]
If yes, it may be the fastest version (can’t promise, though).
No and no.
These queries are equivalent for the result:
let $title := db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:fileDesc[1]//*:titleStmt[1]//*:title[1] let $title := (db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI)[1]//*:fileDesc[1]//*:titleStmt[1]//*:title[1] let $title := (db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:fileDesc)[1]//*:titleStmt[1]//*:title[1]
And your proposal is equivalent to:
let $title := (db:open-id('TG-DTA-GerManC-stemming-ws', $node) /ancestor::*:TEI[1]//*:fileDesc[1]//*:titleStmt)[1]//*:title[1]
What is more, your proposal takes more than double the time.
If no, could you try to specify what the given query is supposed to return in natural language?
The hit I am interested in, is a <p> or <l> node. This node belongs to a specific TEI document representing a certain novel or poem or the like. The DB consists of several thousands such documents. Some TEI documents are nested, i.e., a book representing a collection of poems, so the book is a TEI document and each poem is one, too. I need the bibliographic information of the node.
And then of course, the TEI documents are structured differently. See the xml excerpts at the end (only the <fileDesc> node from the TEI header). The texts come from three main resources, so it might be useful to have a dedicated collection for each of them. However, as the documents within each main resource aren't annotated consistently, there is not much sense to do it. TEI allows to store any information at any place you want, take the author information as example:
a) <author>Ortensio Mauro</author>
in <titleStmt> in <fileDesc>
b) <author> <name key="PND:118648071"> <surname>Alexis</surname> <forename>Willibald</forename> </name> </author>
in <titleStmt> in <fileDesc> _and_ in <itleStmt> in <biblFull> in<sourceDesc>
c) <author key="pnd:11850021X">Abraham a Sancta Clara</author>
in <itleStmt> in <biblFull> in<sourceDesc>, sometimes in this order, sometimes as "name, firstname"
That's what people call Digital Humanities ...
So from a <p> or <l> node I go upwards until I find the first TEI node (/ancestor::*:TEI[1]).
From there, I travel down until I find the first <fileDesc>, and somewhere in there the first <titleStmt> and somewhere in there the first <title> node. And this I use as title.
<fileDesc> <titleStmt> <title>Judas der Erzschelm</title> </titleStmt>
<publicationStmt> <idno type="FileCreationTime">Abraham a Sancta Clara: Element 00008 [2011/07/11 at 20:28:21]</idno> <availability> <p> Der annotierte Datenbestand der Digitalen Bibliothek inklusive Metadaten sowie davon einzeln zugängliche Teile sind eine Abwandlung des Datenbestandes von www.editura.de durch TextGrid und werden unter der Lizenz Creative Commons Namensnennung 3.0 Deutschland Lizenz (by-Nennung TextGrid, www.editura.de) veröffentlicht. Die Lizenz bezieht sich nicht auf die der Annotation zu Grunde liegenden allgemeinfreien Texte (Siehe auch Punkt 2 der Lizenzbestimmungen). </p> <p> <ref target="http://creativecommons.org/licenses/by/3.0/de/legalcode">Lizenzvertrag</ref> </p> <p> <ref target="http://creativecommons.org/licenses/by/3.0/de/"> Eine vereinfachte Zusammenfassung des rechtsverbindlichen Lizenzvertrages in allgemeinverständlicher Sprache </ref> </p> <p> <ref target="http://www.textgrid.de/Digitale-Bibliothek">Hinweise zur Lizenz und zur Digitalen Bibliothek</ref> </p> </availability> </publicationStmt>
<notesStmt> <note> Erstdruck: Salzburg (Haan) 1686, mit kaiserlichem Privileg datiert auf den 25. September 1685, Band 1: 1686; Band 2: 1689; Band 3: 1692; Band 4: 1695. </note> </notesStmt>
<sourceDesc> <biblFull> <titleStmt> <title>Abraham a Sancta Clara: Judas der Erzschelm für ehrliche Leutߣ, oder eigentlicher Entwurf und Lebensbeschreibung des Iscariotischen Böswicht. 7 Bände, in: Abraham a St. Claraߣs Sämmtliche Werke, Band 1, Passau: Friedrich Winkler, 1834–1836.</title> <author key="pnd:11850021X">Abraham a Sancta Clara</author> </titleStmt>
<extent>0-</extent>
<publicationStmt> <date notBefore="1834" notAfter="1836"/> <pubPlace>Passau</pubPlace> </publicationStmt> </biblFull> </sourceDesc> </fileDesc>
#######################################
<fileDesc> <titleStmt> <title type="main">Ruhe ist die erste Bürgerpflicht oder Vor fünfzig Jahren</title> <title type="sub">Vaterländischer Roman</title> <title type="vol" n="1">Erster Band</title> <author> <name key="PND:118648071"> <surname>Alexis</surname> <forename>Willibald</forename> </name> </author> <respStmt corresp="#DTA-Corpus-Publisher"> <name>Marko Drotschmann, Oliver Duntze, Christiane Fritze, Alexander Geyken, Bryan Jurish, Alexander Siebert</name> <resp>conversion to XML/TEI-conformant markup</resp> </respStmt> </titleStmt>
<extent> <measure type="token"/> </extent>
<publicationStmt> <publisher xml:id="DTA-Corpus-Publisher">Deutsches Textarchiv</publisher> <address> <addrLine>Jägerstr. 22, 10117 Berlin</addrLine> <addrLine>dta@bbaw.de</addrLine> </address> <pubPlace>Berlin</pubPlace> <date>2011-05-04 14:53</date> <availability n="OR3P" status="free"> <p>This text is available under Creative Commons license CC-BY</p> </availability> <idno type="URN">urn:nbn:de:kobv:b4-2009051900</idno> <idno type="DTAID">16518</idno> </publicationStmt>
<sourceDesc n="orig">
<bibl>Alexis, Willibald: Ruhe ist die erste Bürgerpflicht. Bd. 1. Berlin: Barthol, 1852.</bibl>
<biblFull> <titleStmt> <title level="m" type="main">Ruhe ist die erste Bürgerpflicht oder Vor fünfzig Jahren</title> <title level="m" type="sub">Vaterländischer Roman</title> <title level="m" type="vol" n="1">Erster Band</title> <author> <name key="PND:118648071"> <surname>Alexis</surname> <forename>Willibald</forename> </name> </author> </titleStmt> <extent> <measure type="pages" n="356">VIII, 348 S.</measure> </extent> <publicationStmt> <publisher>Barthol</publisher> <pubPlace>Berlin</pubPlace> <date type="first">1852</date> </publicationStmt> <notesStmt> <note type="identifier"> <ident type="epn">876060645</ident> </note> <note type="location"> <name type="repository">Staatsbibliothek zu Berlin - PK</name> <ident type="shelfmark"/> </note> 0 <note type="pub_type">monograph</note> </notesStmt> </biblFull>
<listPerson type="searchNames"> <person><persName>Willibald Alexis</persName></person> </listPerson> </sourceDesc> </fileDesc>
############################
<fileDesc> <titleStmt> <title>Der in seiner Freyheit vergnuͤgte <hi rend="antiqua">ALCIBIADES, In einem Sing-Spiel vorgestellet Auf dem Braunschweigischen Schau-Platz [...]</hi></title> <author>Ortensio Mauro</author> </titleStmt>
<publicationStmt> <pubPlace>Braunschweig</pubPlace> <date>1700</date> </publicationStmt>
<notesStmt> <note type="filename">DRAM_P1_NoD_1700_Freyheit</note> <note type="region">North German</note> <note type="genre">Drama</note> <note type="period"><date>1650-1700</date></note> <note type="extract"><bibl>Act I, Scene 1-Act II, Scene 5</bibl></note> </notesStmt>
<sourceDesc> <p>Extract taken from the digital collection of the Herzog-August-Bibliothek Wolfenbüttel: http://diglib.hab.de/drucke/textb-31/start.htm</p> </sourceDesc> </fileDesc>
basex-talk@mailman.uni-konstanz.de