Dear members, dear Christian,
when using a for-loop inside of a for-loop I run into serious performance-problems, when I'm using variables to return results. For example this code-line takes 36780ms (!!!) for evaluation:
return <li><a>{$title}</a>{$quote}</li>
When I substitute the variable $time with the defining code, it takes only 711ms for evaluation (!!):
<li><a> { if ($ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="short"]) then $ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="short"]/string() else if ($ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="article"]) then $ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="article"]/string() else if ($ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="main"]) then $ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="main"]/string() else ()} </a>{$quote}</li>
The only difference in the code is using a variable and it differs in performance by a factor of more than 50. In my real application (using more variables) it differs by factor more than 150.
I tried to simulate the problem with a simple factbook-query. Here is the code: xquery version "3.0"; let $query := "Pa.*" for $city in doc('factbook')//city/name for $hits in ft:mark($city[.//text() contains text {$query} using wildcards]) let $country_of_city := $city/ancestor::country/name return (: fast version: Evaluating 12.54ms :) <hit><city>{$hits}</city><country>{$city/ancestor::country/name}</country></hit>
(: slow version: Evaluating 35.78 :) (: <hit><city>{$hits}</city><country>{$country_of_city}</country></hit> :)
Also here is a performance bottleneck (not as dramatically: factor 3). Is there any solution to the problem? Now I'm working in my code without variables in the result part of the xquery, but it makes realy ugly non maintainable code.
Any help would be appreciated. - Günter
Hi Günter,
Interesting one. It seems that there are cases in which the sliding 'let' clauses will slow down execution time; see [1]. I’ll first need to decide for better heuristics before I’ll "fix" this.
In the meanwhile, please note that the full-text index (if it exists) will not be utilized by your second query. It may seem contra-intuitive, but your query will probably be executed faster if you apply the search condition twice:
let $query := "Paris" for $city in doc('factbook')//city/name[text() contains text {$query}] return ft:mark($city)
The optimized query will look as follows (check out the Query Info panel):
for $city_1 in ft:search("factbook", "Paris" using language 'English')/ parent::*:name[parent::*:city] return ft:mark(($city_1)[text() contains text "Paris" using wildcards using language 'English'])
The reason is that the word positions, which are resulting from a full-text request, won’t be implicitly bound to the $city variable, as this can take a lot of memory. In a future version of baseX, we’ll probably try to find a smarter way to find out if the positions may be required later on in a query. Once this is realized, you’ll be able to write queries like the following one:
let $query := "Paris" for $city in doc('factbook')//city/name[text() contains text {$query}] return ft:mark($city)
And one more thing:
{ if ($ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="short"]) then $ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="short"]/string() else if ($ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="article"]) then $ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="article"]/string() else if ($ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="main"]) then $ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="main"]/string()
This code will most probably be executed faster if you only do the path traveral once, e.g. as follows:
let $title := $ele/ancestor::tei:TEI//tei:titleStmt/tei:title [@type = ('short', 'article', 'main')]/string() ... return $title
Best, Christian
[1] https://github.com/BaseXdb/basex/issues/1236
On Sat, Jan 2, 2016 at 6:52 PM, kleist kleist@mail.dunzwolff.de wrote:
Dear members, dear Christian,
when using a for-loop inside of a for-loop I run into serious performance-problems, when I'm using variables to return results. For example this code-line takes 36780ms (!!!) for evaluation:
return <li><a>{$title}</a>{$quote}</li>
When I substitute the variable $time with the defining code, it takes only 711ms for evaluation (!!):
<li><a> { if ($ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="short"]) then $ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="short"]/string() else if ($ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="article"]) then $ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="article"]/string() else if ($ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="main"]) then $ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="main"]/string() else ()} </a>{$quote}</li>
The only difference in the code is using a variable and it differs in performance by a factor of more than 50. In my real application (using more variables) it differs by factor more than 150.
I tried to simulate the problem with a simple factbook-query. Here is the code: xquery version "3.0"; let $query := "Pa.*" for $city in doc('factbook')//city/name for $hits in ft:mark($city[.//text() contains text {$query} using wildcards]) let $country_of_city := $city/ancestor::country/name return (: fast version: Evaluating 12.54ms :) <hit><city>{$hits}</city><country>{$city/ancestor::country/name}</country></hit>
(: slow version: Evaluating 35.78 :) (: <hit><city>{$hits}</city><country>{$country_of_city}</country></hit> :)
Also here is a performance bottleneck (not as dramatically: factor 3). Is there any solution to the problem? Now I'm working in my code without variables in the result part of the xquery, but it makes realy ugly non maintainable code.
Any help would be appreciated.
- Günter
Hi Christian,
thanks a lot for your advice, but sorry, but I don't really get it so far.
Your code
let $query := "Paris" for $city in doc('factbook')//city/name[text() contains text {$query}] return ft:mark($city)
doesn't return the 'mark'-tags, which are important for me and where are you applying the search condition twice? Also I'm depending on information about the ancestor-node (country). Perhaps, I got it totally wrong.
About your one more thing ;-)
And one more thing:
{ if ($ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="short"]) then $ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="short"]/string() else if ($ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="article"]) then $ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="article"]/string() else if ($ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="main"]) then $ele/ancestor::tei:TEI//tei:titleStmt/tei:title[@type="main"]/string()
This code will most probably be executed faster if you only do the path traveral once, e.g. as follows:
let $title := $ele/ancestor::tei:TEI//tei:titleStmt/tei:title [@type = ('short', 'article', 'main')]/string() ... return $title
I tried it and it's reducing the evaluation-time by half (ca. 16000ms). But if I don't use the let clause and return your code directly, the evaluation time goes down to 588ms!!
Best, Günter
let $query := "Paris" for $city in doc('factbook')//city/name[text() contains text {$query}] return ft:mark($city)
doesn't return the 'mark'-tags
Sorry, should have been like that:
let $query := "Paris" for $city in doc('factbook')//city/name[text() contains text {$query}] return ft:mark($city[text() contains text {$query}])
By the way, here are two more rewritings to avoid the current sliding of the let clause:
Variant A: for $city in doc('factbook')//city/name let $hits := ft:mark($city[text() contains text 'paris']) where $hits let $name := $city/ancestor::country/name return ($hits, $name)
Variant B: for $city in doc('factbook')//city/name for $hits in ft:mark($city[text() contains text 'paris']) return let $name := $city/ancestor::country/name return ($hits, $name)
And here is one more programmatic way to specify full-text options only once in the query:
let $ft := map { 'wildcards': true() } let $terms := 'pa.*' for $city in ft:search('factbook', $terms, $ft)/ parent::name[ancestor::city] let $hits := ft:mark($city[ft:contains(text(), $terms, $ft)]) return let $name := $city/ancestor::country/name return ($hits, $name)
This variant is clearly the most verbose one, but it may turn out to be the favorite once you want to specify a lot more full-text options more than once.
On Sun, Jan 3, 2016 at 12:44 PM, Christian Grün christian.gruen@gmail.com wrote:
let $query := "Paris" for $city in doc('factbook')//city/name[text() contains text {$query}] return ft:mark($city)
doesn't return the 'mark'-tags
Sorry, should have been like that:
let $query := "Paris" for $city in doc('factbook')//city/name[text() contains text {$query}] return ft:mark($city[text() contains text {$query}])
Dear Christian,
thanks a lot for your help. Now I have lots of stuff to think about and to implement.
Best, Günter
Am 04.01.2016 um 02:29 schrieb Christian Grün christian.gruen@gmail.com:
By the way, here are two more rewritings to avoid the current sliding of the let clause:
Variant A: for $city in doc('factbook')//city/name let $hits := ft:mark($city[text() contains text 'paris']) where $hits let $name := $city/ancestor::country/name return ($hits, $name)
Variant B: for $city in doc('factbook')//city/name for $hits in ft:mark($city[text() contains text 'paris']) return let $name := $city/ancestor::country/name return ($hits, $name)
And here is one more programmatic way to specify full-text options only once in the query:
let $ft := map { 'wildcards': true() } let $terms := 'pa.*' for $city in ft:search('factbook', $terms, $ft)/ parent::name[ancestor::city] let $hits := ft:mark($city[ft:contains(text(), $terms, $ft)]) return let $name := $city/ancestor::country/name return ($hits, $name)
This variant is clearly the most verbose one, but it may turn out to be the favorite once you want to specify a lot more full-text options more than once.
On Sun, Jan 3, 2016 at 12:44 PM, Christian Grün christian.gruen@gmail.com wrote:
let $query := "Paris" for $city in doc('factbook')//city/name[text() contains text {$query}] return ft:mark($city)
doesn't return the 'mark'-tags
Sorry, should have been like that:
let $query := "Paris" for $city in doc('factbook')//city/name[text() contains text {$query}] return ft:mark($city[text() contains text {$query}])
Hi Günter,
I had one more look at the slow query you were encountering:
for $city in doc('factbook')//city/name for $hits in ft:mark($city[.//text() contains text {$query} using wildcards]) let $country_of_city := $city/ancestor::country/name return (: slow version: Evaluating 35.78 :) (: <hit><city>{$hits}</city><country>{$country_of_city}</country></hit> :)
This one was more tricky than I expected, because the ft:mark function can produce multiple results for a single node. This is why the sliding of the let clause, which slows down your query, can be beneficial in other cases. The following query will generate 20 results (10 "mark" elements, 10 text nodes), so it will be evaluated faster if the let clause is slided over the for clause:
let $input := (<X>{ 'A.A.A.A.A.A.A.A.A.A.' }</X> update () )/text() for $hits in ft:mark($input[. contains text 'A']) let $parent := $input/.. return <hit id='{ db:node-id($parent) }'>{ $hits }</hit>
Well, those are lots of internal details that I think you can easily ignore. In a nutshell: Just use 'let' and 'where' instead tof 'for':
let $input := (<X>{ 'A.A.A.A.A.A.A.A.A.A.' }</X> update () )/text() let $hits := ft:mark($input[. contains text 'A']) where $hits let $parent := $input/.. return <hit id='{ db:node-id($parent) }'>{ $hits }</hit>
Christian
Hi Christian,
I refactored my xquery and it's now running very smooth, faster than ever. Thanks a lot for your great help. I also followed your advice, creating a new index just for the full-text-search in mixed-content-environments. This approach is solving a lot of problems I had in the past... Thanks again.
I want to use also your programmatic way to specify full-text options only once in the query, cause it would make my code much cleaner:
let $ft := map { 'wildcards': true() } let $terms := 'pa.*' for $city in ft:search('factbook', $terms, $ft)/parent::name[ancestor::city] let $hits := ft:mark($city[ft:contains(text(), $terms, $ft)]) return let $name := $city/ancestor::country/name return ($hits, $name)
The $options argument in ft:search has most the full-text options, I need. The only problem left is, to integrate the full-text options 'case sensitive' and 'diacritics sensitive'. Is there any way, to get those options inside the ft:search function?
Günter
Am 05.01.2016 um 19:56 schrieb Christian Grün christian.gruen@gmail.com:
Hi Günter,
I had one more look at the slow query you were encountering:
for $city in doc('factbook')//city/name for $hits in ft:mark($city[.//text() contains text {$query} using wildcards]) let $country_of_city := $city/ancestor::country/name return (: slow version: Evaluating 35.78 :) (: <hit><city>{$hits}</city><country>{$country_of_city}</country></hit> :)
This one was more tricky than I expected, because the ft:mark function can produce multiple results for a single node. This is why the sliding of the let clause, which slows down your query, can be beneficial in other cases. The following query will generate 20 results (10 "mark" elements, 10 text nodes), so it will be evaluated faster if the let clause is slided over the for clause:
let $input := (<X>{ 'A.A.A.A.A.A.A.A.A.A.' }</X> update () )/text() for $hits in ft:mark($input[. contains text 'A']) let $parent := $input/.. return <hit id='{ db:node-id($parent) }'>{ $hits }</hit>
Well, those are lots of internal details that I think you can easily ignore. In a nutshell: Just use 'let' and 'where' instead tof 'for':
let $input := (<X>{ 'A.A.A.A.A.A.A.A.A.A.' }</X> update () )/text() let $hits := ft:mark($input[. contains text 'A']) where $hits let $parent := $input/.. return <hit id='{ db:node-id($parent) }'>{ $hits }</hit>
Christian
The $options argument in ft:search has most the full-text options, I need. The only problem left is, to integrate the full-text options 'case sensitive' and 'diacritics sensitive'. Is there any way, to get those options inside the ft:search function?
If the tokens in the full-text index have no diacritics, the diacritics will also be removed from your search terms (the same applies to case, stemming, etc.; see the Summary text in [1]).
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Full-Text_Module#ft:search
Günter
Am 05.01.2016 um 19:56 schrieb Christian Grün christian.gruen@gmail.com:
Hi Günter,
I had one more look at the slow query you were encountering:
for $city in doc('factbook')//city/name for $hits in ft:mark($city[.//text() contains text {$query} using wildcards]) let $country_of_city := $city/ancestor::country/name return (: slow version: Evaluating 35.78 :) (: <hit><city>{$hits}</city><country>{$country_of_city}</country></hit> :)
This one was more tricky than I expected, because the ft:mark function can produce multiple results for a single node. This is why the sliding of the let clause, which slows down your query, can be beneficial in other cases. The following query will generate 20 results (10 "mark" elements, 10 text nodes), so it will be evaluated faster if the let clause is slided over the for clause:
let $input := (<X>{ 'A.A.A.A.A.A.A.A.A.A.' }</X> update () )/text() for $hits in ft:mark($input[. contains text 'A']) let $parent := $input/.. return <hit id='{ db:node-id($parent) }'>{ $hits }</hit>
Well, those are lots of internal details that I think you can easily ignore. In a nutshell: Just use 'let' and 'where' instead tof 'for':
let $input := (<X>{ 'A.A.A.A.A.A.A.A.A.A.' }</X> update () )/text() let $hits := ft:mark($input[. contains text 'A']) where $hits let $parent := $input/.. return <hit id='{ db:node-id($parent) }'>{ $hits }</hit>
Christian
basex-talk@mailman.uni-konstanz.de