Hi,
I have a query which is optimized in a curious way in BaseX 9.0.2 (yesterday's snapshot).
This is the original query:
xquery version "3.1"; declare namespace tei ="http://www.tei-c.org/ns/1.0"; let $string := "string" let $fuzzy := false()
return ( collection('ZK')/tei:TEI[ if (false()) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ], collection('ZK')/tei:TEI[ if ($fuzzy) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ] )
And this is the optimized one (newlines inserted by me for better readability):
( ft:search("ZK", "string" using language 'English')/ancestor::tei:TEI[parent::document-node()], ft:search("ZK", "string" using fuzzy using language 'English')/ancestor::tei:TEI[parent::document-node()] )
I'm curious why the second search is using fuzzy, even though the variable $fuzzy is false. I presume that query optimization is independent of the data, so you won't need the data to reproduce. But if you do, I can provide it. A database with enabled full-text index is required obviously.
Best regards, Sebastian Zimmer
Hi Sebastian,
This has been fixed. The background: In one of the optimizations of the "if" expression, identical branches are merged:
if(..expensive query..) then 1 else 1 → Optimized Query: 1
The full-text options were ignored in the equality check. A new snapshot is online.
Best, Christian
On Wed, Jul 11, 2018 at 1:22 PM Sebastian Zimmer sebastian.zimmer@uni-koeln.de wrote:
Hi,
I have a query which is optimized in a curious way in BaseX 9.0.2 (yesterday's snapshot).
This is the original query:
xquery version "3.1"; declare namespace tei = "http://www.tei-c.org/ns/1.0"; let $string := "string" let $fuzzy := false()
return ( collection('ZK')/tei:TEI[ if (false()) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ], collection('ZK')/tei:TEI[ if ($fuzzy) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ] )
And this is the optimized one (newlines inserted by me for better readability):
( ft:search("ZK", "string" using language 'English')/ancestor::tei:TEI[parent::document-node()], ft:search("ZK", "string" using fuzzy using language 'English')/ancestor::tei:TEI[parent::document-node()] )
I'm curious why the second search is using fuzzy, even though the variable $fuzzy is false. I presume that query optimization is independent of the data, so you won't need the data to reproduce. But if you do, I can provide it. A database with enabled full-text index is required obviously.
Best regards, Sebastian Zimmer
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
Hi Christian,
thanks for the fix, the result is correct now.
But this query now takes about 18 seconds (!) to execute, instead of <1 second like before. Do you think, this could be accelerated?
See attached for the complete console output.
Best, Sebastian
Am 12.07.2018 um 13:03 schrieb Christian Grün:
Hi Sebastian,
This has been fixed. The background: In one of the optimizations of the "if" expression, identical branches are merged:
if(..expensive query..) then 1 else 1 → Optimized Query: 1
The full-text options were ignored in the equality check. A new snapshot is online.
Best, Christian
On Wed, Jul 11, 2018 at 1:22 PM Sebastian Zimmer sebastian.zimmer@uni-koeln.de wrote:
Hi,
I have a query which is optimized in a curious way in BaseX 9.0.2 (yesterday's snapshot).
This is the original query:
xquery version "3.1"; declare namespace tei = "http://www.tei-c.org/ns/1.0"; let $string := "string" let $fuzzy := false()
return ( collection('ZK')/tei:TEI[ if (false()) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ], collection('ZK')/tei:TEI[ if ($fuzzy) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ] )
And this is the optimized one (newlines inserted by me for better readability):
( ft:search("ZK", "string" using language 'English')/ancestor::tei:TEI[parent::document-node()], ft:search("ZK", "string" using fuzzy using language 'English')/ancestor::tei:TEI[parent::document-node()] )
I'm curious why the second search is using fuzzy, even though the variable $fuzzy is false. I presume that query optimization is independent of the data, so you won't need the data to reproduce. But if you do, I can provide it. A database with enabled full-text index is required obviously.
Best regards, Sebastian Zimmer
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
Hi Sebastian,
Did you check in the Info View panel if the index is applied? If no, you might try something as follows:
if ($fuzzy) then ( collection('ZK')/tei:TEI[... using fuzzy]) ) else ( collection('ZK')/tei:TEI[...]) )
Usually, if full-text options are dynamic, I tend to use ft:search [1].
Best, Christian
[1] http://docs.basex.org/wiki/Full-Text_Module#ft:search
On Mon, Jul 23, 2018 at 11:41 AM Sebastian Zimmer sebastian.zimmer@uni-koeln.de wrote:
Hi Christian,
thanks for the fix, the result is correct now.
But this query now takes about 18 seconds (!) to execute, instead of <1 second like before. Do you think, this could be accelerated?
See attached for the complete console output.
Best, Sebastian
Am 12.07.2018 um 13:03 schrieb Christian Grün:
Hi Sebastian,
This has been fixed. The background: In one of the optimizations of the "if" expression, identical branches are merged:
if(..expensive query..) then 1 else 1 → Optimized Query: 1
The full-text options were ignored in the equality check. A new snapshot is online.
Best, Christian
On Wed, Jul 11, 2018 at 1:22 PM Sebastian Zimmer sebastian.zimmer@uni-koeln.de wrote:
Hi,
I have a query which is optimized in a curious way in BaseX 9.0.2 (yesterday's snapshot).
This is the original query:
xquery version "3.1"; declare namespace tei = "http://www.tei-c.org/ns/1.0"; let $string := "string" let $fuzzy := false()
return ( collection('ZK')/tei:TEI[ if (false()) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ], collection('ZK')/tei:TEI[ if ($fuzzy) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ] )
And this is the optimized one (newlines inserted by me for better readability):
( ft:search("ZK", "string" using language 'English')/ancestor::tei:TEI[parent::document-node()], ft:search("ZK", "string" using fuzzy using language 'English')/ancestor::tei:TEI[parent::document-node()] )
I'm curious why the second search is using fuzzy, even though the variable $fuzzy is false. I presume that query optimization is independent of the data, so you won't need the data to reproduce. But if you do, I can provide it. A database with enabled full-text index is required obviously.
Best regards, Sebastian Zimmer
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
The full-text index of the database is enabled and the compiling info section of the query states
- apply full-text index for { $string_0 } using language 'English'
The optimized query looks to me as if the index is applied only once via ft:search, but not in both cases.
Best, Sebastian
Am 23.07.2018 um 11:47 schrieb Christian Grün:
Hi Sebastian,
Did you check in the Info View panel if the index is applied? If no, you might try something as follows:
if ($fuzzy) then ( collection('ZK')/tei:TEI[... using fuzzy]) ) else ( collection('ZK')/tei:TEI[...]) )
Usually, if full-text options are dynamic, I tend to use ft:search [1].
Best, Christian
[1] http://docs.basex.org/wiki/Full-Text_Module#ft:search
On Mon, Jul 23, 2018 at 11:41 AM Sebastian Zimmer sebastian.zimmer@uni-koeln.de wrote:
Hi Christian,
thanks for the fix, the result is correct now.
But this query now takes about 18 seconds (!) to execute, instead of <1 second like before. Do you think, this could be accelerated?
See attached for the complete console output.
Best, Sebastian
Am 12.07.2018 um 13:03 schrieb Christian Grün:
Hi Sebastian,
This has been fixed. The background: In one of the optimizations of the "if" expression, identical branches are merged:
if(..expensive query..) then 1 else 1 → Optimized Query: 1
The full-text options were ignored in the equality check. A new snapshot is online.
Best, Christian
On Wed, Jul 11, 2018 at 1:22 PM Sebastian Zimmer sebastian.zimmer@uni-koeln.de wrote:
Hi,
I have a query which is optimized in a curious way in BaseX 9.0.2 (yesterday's snapshot).
This is the original query:
xquery version "3.1"; declare namespace tei = "http://www.tei-c.org/ns/1.0"; let $string := "string" let $fuzzy := false()
return ( collection('ZK')/tei:TEI[ if (false()) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ], collection('ZK')/tei:TEI[ if ($fuzzy) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ] )
And this is the optimized one (newlines inserted by me for better readability):
( ft:search("ZK", "string" using language 'English')/ancestor::tei:TEI[parent::document-node()], ft:search("ZK", "string" using fuzzy using language 'English')/ancestor::tei:TEI[parent::document-node()] )
I'm curious why the second search is using fuzzy, even though the variable $fuzzy is false. I presume that query optimization is independent of the data, so you won't need the data to reproduce. But if you do, I can provide it. A database with enabled full-text index is required obviously.
Best regards, Sebastian Zimmer
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
Hi again,
I have compared the output of the latest BaseX 9.1 with 8.6.7 and it looks like a regression.
Whereas in 8.6.7, the query ist optimized two times with ft:search, in 9.1 only one time.
See attached for both outputs.
Best, Sebastian
Am 23.07.2018 um 11:55 schrieb Sebastian Zimmer:
The full-text index of the database is enabled and the compiling info section of the query states
- apply full-text index for { $string_0 } using language 'English'
The optimized query looks to me as if the index is applied only once via ft:search, but not in both cases.
Best, Sebastian
Am 23.07.2018 um 11:47 schrieb Christian Grün:
Hi Sebastian,
Did you check in the Info View panel if the index is applied? If no, you might try something as follows:
if ($fuzzy) then ( collection('ZK')/tei:TEI[... using fuzzy]) ) else ( collection('ZK')/tei:TEI[...]) )
Usually, if full-text options are dynamic, I tend to use ft:search [1].
Best, Christian
[1]http://docs.basex.org/wiki/Full-Text_Module#ft:search
On Mon, Jul 23, 2018 at 11:41 AM Sebastian Zimmer sebastian.zimmer@uni-koeln.de wrote:
Hi Christian,
thanks for the fix, the result is correct now.
But this query now takes about 18 seconds (!) to execute, instead of <1 second like before. Do you think, this could be accelerated?
See attached for the complete console output.
Best, Sebastian
Am 12.07.2018 um 13:03 schrieb Christian Grün:
Hi Sebastian,
This has been fixed. The background: In one of the optimizations of the "if" expression, identical branches are merged:
if(..expensive query..) then 1 else 1 → Optimized Query: 1
The full-text options were ignored in the equality check. A new snapshot is online.
Best, Christian
On Wed, Jul 11, 2018 at 1:22 PM Sebastian Zimmer sebastian.zimmer@uni-koeln.de wrote:
Hi,
I have a query which is optimized in a curious way in BaseX 9.0.2 (yesterday's snapshot).
This is the original query:
xquery version "3.1"; declare namespace tei ="http://www.tei-c.org/ns/1.0"; let $string := "string" let $fuzzy := false()
return ( collection('ZK')/tei:TEI[ if (false()) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ], collection('ZK')/tei:TEI[ if ($fuzzy) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ] )
And this is the optimized one (newlines inserted by me for better readability):
( ft:search("ZK", "string" using language 'English')/ancestor::tei:TEI[parent::document-node()], ft:search("ZK", "string" using fuzzy using language 'English')/ancestor::tei:TEI[parent::document-node()] )
I'm curious why the second search is using fuzzy, even though the variable $fuzzy is false. I presume that query optimization is independent of the data, so you won't need the data to reproduce. But if you do, I can provide it. A database with enabled full-text index is required obviously.
Best regards, Sebastian Zimmer
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de mailto:sebastian.zimmer@uni-koeln.de CCeH Logo http://cceh.uni-koeln.de
Cologne Center for eHumanities http://cceh.uni-koeln.de DH Center at the University of Cologne Twitter Logo https://twitter.com/CCeHum@CCeHum https://twitter.com/CCeHum
Hi Sebastian,
I guess the index was only rewritten for index access in BaseX 8 because the two branches of the if/then/else expression was simplified incorrectly and replaced with one of the branches. To simplify debugging, could you please simplify your query even more and drop all superfluous expressions that do not relate to this bug?
Thanks in advance, Christian
On Tue, Jul 24, 2018 at 11:16 AM Sebastian Zimmer < sebastian.zimmer@uni-koeln.de> wrote:
Hi again,
I have compared the output of the latest BaseX 9.1 with 8.6.7 and it looks like a regression.
Whereas in 8.6.7, the query ist optimized two times with ft:search, in 9.1 only one time.
See attached for both outputs.
Best, Sebastian
Am 23.07.2018 um 11:55 schrieb Sebastian Zimmer:
The full-text index of the database is enabled and the compiling info section of the query states
- apply full-text index for { $string_0 } using language 'English'
The optimized query looks to me as if the index is applied only once via ft:search, but not in both cases.
Best, Sebastian
Am 23.07.2018 um 11:47 schrieb Christian Grün:
Hi Sebastian,
Did you check in the Info View panel if the index is applied? If no, you might try something as follows:
if ($fuzzy) then ( collection('ZK')/tei:TEI[... using fuzzy]) ) else ( collection('ZK')/tei:TEI[...]) )
Usually, if full-text options are dynamic, I tend to use ft:search [1].
Best, Christian
[1] http://docs.basex.org/wiki/Full-Text_Module#ft:search
On Mon, Jul 23, 2018 at 11:41 AM Sebastian Zimmersebastian.zimmer@uni-koeln.de sebastian.zimmer@uni-koeln.de wrote:
Hi Christian,
thanks for the fix, the result is correct now.
But this query now takes about 18 seconds (!) to execute, instead of <1 second like before. Do you think, this could be accelerated?
See attached for the complete console output.
Best, Sebastian
Am 12.07.2018 um 13:03 schrieb Christian Grün:
Hi Sebastian,
This has been fixed. The background: In one of the optimizations of the "if" expression, identical branches are merged:
if(..expensive query..) then 1 else 1 → Optimized Query: 1
The full-text options were ignored in the equality check. A new snapshot is online.
Best, Christian
On Wed, Jul 11, 2018 at 1:22 PM Sebastian Zimmersebastian.zimmer@uni-koeln.de sebastian.zimmer@uni-koeln.de wrote:
Hi,
I have a query which is optimized in a curious way in BaseX 9.0.2 (yesterday's snapshot).
This is the original query:
xquery version "3.1"; declare namespace tei = "http://www.tei-c.org/ns/1.0" http://www.tei-c.org/ns/1.0; let $string := "string" let $fuzzy := false()
return ( collection('ZK')/tei:TEI[ if (false()) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ], collection('ZK')/tei:TEI[ if ($fuzzy) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ] )
And this is the optimized one (newlines inserted by me for better readability):
( ft:search("ZK", "string" using language 'English')/ancestor::tei:TEI[parent::document-node()], ft:search("ZK", "string" using fuzzy using language 'English')/ancestor::tei:TEI[parent::document-node()] )
I'm curious why the second search is using fuzzy, even though the variable $fuzzy is false. I presume that query optimization is independent of the data, so you won't need the data to reproduce. But if you do, I can provide it. A database with enabled full-text index is required obviously.
Best regards, Sebastian Zimmer
-- Sebastian Zimmersebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
-- Sebastian Zimmersebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de [image: CCeH Logo] http://cceh.uni-koeln.de
Cologne Center for eHumanities http://cceh.uni-koeln.de DH Center at the University of Cologne [image: Twitter Logo] https://twitter.com/CCeHum@CCeHum https://twitter.com/CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de [image: CCeH Logo] http://cceh.uni-koeln.de
Cologne Center for eHumanities http://cceh.uni-koeln.de DH Center at the University of Cologne [image: Twitter Logo] https://twitter.com/CCeHum@CCeHum https://twitter.com/CCeHum
Hi Christian,
this is the simplest query I could come up with:
xquery version "3.1"; let $fuzzy := false()
return ( collection('BIBL')/*:TEI[ if ($fuzzy) then () else (.[descendant::text() contains text {"string"} using fuzzy]) ] )
is optimized to:
(db:open-pre("BIBL", 0), ...)/*:TEI[descendant::text() contains text "string" using fuzzy using language 'English']
should be optimized to:
ft:search("BIBL", "string")/ancestor::*:TEI
Thanks, Sebastian
Am 24.07.2018 um 11:23 schrieb Christian Grün:
Hi Sebastian,
I guess the index was only rewritten for index access in BaseX 8 because the two branches of the if/then/else expression was simplified incorrectly and replaced with one of the branches. To simplify debugging, could you please simplify your query even more and drop all superfluous expressions that do not relate to this bug?
Thanks in advance, Christian
On Tue, Jul 24, 2018 at 11:16 AM Sebastian Zimmer <sebastian.zimmer@uni-koeln.de mailto:sebastian.zimmer@uni-koeln.de> wrote:
Hi again, I have compared the output of the latest BaseX 9.1 with 8.6.7 and it looks like a regression. Whereas in 8.6.7, the query ist optimized two times with ft:search, in 9.1 only one time. See attached for both outputs. Best, Sebastian Am 23.07.2018 um 11:55 schrieb Sebastian Zimmer:
The full-text index of the database is enabled and the compiling info section of the query states - apply full-text index for { $string_0 } using language 'English' The optimized query looks to me as if the index is applied only once via ft:search, but not in both cases. Best, Sebastian Am 23.07.2018 um 11:47 schrieb Christian Grün:
Hi Sebastian, Did you check in the Info View panel if the index is applied? If no, you might try something as follows: if ($fuzzy) then ( collection('ZK')/tei:TEI[... using fuzzy]) ) else ( collection('ZK')/tei:TEI[...]) ) Usually, if full-text options are dynamic, I tend to use ft:search [1]. Best, Christian [1]http://docs.basex.org/wiki/Full-Text_Module#ft:search On Mon, Jul 23, 2018 at 11:41 AM Sebastian Zimmer <sebastian.zimmer@uni-koeln.de> <mailto:sebastian.zimmer@uni-koeln.de> wrote:
Hi Christian, thanks for the fix, the result is correct now. But this query now takes about 18 seconds (!) to execute, instead of <1 second like before. Do you think, this could be accelerated? See attached for the complete console output. Best, Sebastian Am 12.07.2018 um 13:03 schrieb Christian Grün: Hi Sebastian, This has been fixed. The background: In one of the optimizations of the "if" expression, identical branches are merged: if(..expensive query..) then 1 else 1 → Optimized Query: 1 The full-text options were ignored in the equality check. A new snapshot is online. Best, Christian On Wed, Jul 11, 2018 at 1:22 PM Sebastian Zimmer <sebastian.zimmer@uni-koeln.de> <mailto:sebastian.zimmer@uni-koeln.de> wrote: Hi, I have a query which is optimized in a curious way in BaseX 9.0.2 (yesterday's snapshot). This is the original query: xquery version "3.1"; declare namespace tei ="http://www.tei-c.org/ns/1.0" <http://www.tei-c.org/ns/1.0>; let $string := "string" let $fuzzy := false() return ( collection('ZK')/tei:TEI[ if (false()) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ], collection('ZK')/tei:TEI[ if ($fuzzy) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ] ) And this is the optimized one (newlines inserted by me for better readability): ( ft:search("ZK", "string" using language 'English')/ancestor::tei:TEI[parent::document-node()], ft:search("ZK", "string" using fuzzy using language 'English')/ancestor::tei:TEI[parent::document-node()] ) I'm curious why the second search is using fuzzy, even though the variable $fuzzy is false. I presume that query optimization is independent of the data, so you won't need the data to reproduce. But if you do, I can provide it. A database with enabled full-text index is required obviously. Best regards, Sebastian Zimmer -- Sebastian Zimmer sebastian.zimmer@uni-koeln.de <mailto:sebastian.zimmer@uni-koeln.de> Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum -- Sebastian Zimmer sebastian.zimmer@uni-koeln.de <mailto:sebastian.zimmer@uni-koeln.de> Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de <mailto:sebastian.zimmer@uni-koeln.de> CCeH Logo <http://cceh.uni-koeln.de> Cologne Center for eHumanities <http://cceh.uni-koeln.de> DH Center at the University of Cologne Twitter Logo <https://twitter.com/CCeHum>@CCeHum <https://twitter.com/CCeHum>
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de <mailto:sebastian.zimmer@uni-koeln.de> CCeH Logo <http://cceh.uni-koeln.de> Cologne Center for eHumanities <http://cceh.uni-koeln.de> DH Center at the University of Cologne Twitter Logo <https://twitter.com/CCeHum>@CCeHum <https://twitter.com/CCeHum>
Thanks, good catch. I have opened an issue for that [1]. – Best, Christian
[1] https://github.com/BaseXdb/basex/issues/1597
On Tue, Jul 24, 2018 at 11:51 AM Sebastian Zimmer < sebastian.zimmer@uni-koeln.de> wrote:
Hi Christian,
this is the simplest query I could come up with:
xquery version "3.1"; let $fuzzy := false()
return ( collection('BIBL')/*:TEI[ if ($fuzzy) then () else (.[descendant::text() contains text {"string"} using fuzzy]) ] )
is optimized to:
(db:open-pre("BIBL", 0), ...)/*:TEI[descendant::text() contains text "string" using fuzzy using language 'English']
should be optimized to:
ft:search("BIBL", "string")/ancestor::*:TEI
Thanks, Sebastian
Am 24.07.2018 um 11:23 schrieb Christian Grün:
Hi Sebastian,
I guess the index was only rewritten for index access in BaseX 8 because the two branches of the if/then/else expression was simplified incorrectly and replaced with one of the branches. To simplify debugging, could you please simplify your query even more and drop all superfluous expressions that do not relate to this bug?
Thanks in advance, Christian
On Tue, Jul 24, 2018 at 11:16 AM Sebastian Zimmer < sebastian.zimmer@uni-koeln.de> wrote:
Hi again,
I have compared the output of the latest BaseX 9.1 with 8.6.7 and it looks like a regression.
Whereas in 8.6.7, the query ist optimized two times with ft:search, in 9.1 only one time.
See attached for both outputs.
Best, Sebastian
Am 23.07.2018 um 11:55 schrieb Sebastian Zimmer:
The full-text index of the database is enabled and the compiling info section of the query states
- apply full-text index for { $string_0 } using language 'English'
The optimized query looks to me as if the index is applied only once via ft:search, but not in both cases.
Best, Sebastian
Am 23.07.2018 um 11:47 schrieb Christian Grün:
Hi Sebastian,
Did you check in the Info View panel if the index is applied? If no, you might try something as follows:
if ($fuzzy) then ( collection('ZK')/tei:TEI[... using fuzzy]) ) else ( collection('ZK')/tei:TEI[...]) )
Usually, if full-text options are dynamic, I tend to use ft:search [1].
Best, Christian
[1] http://docs.basex.org/wiki/Full-Text_Module#ft:search
On Mon, Jul 23, 2018 at 11:41 AM Sebastian Zimmersebastian.zimmer@uni-koeln.de sebastian.zimmer@uni-koeln.de wrote:
Hi Christian,
thanks for the fix, the result is correct now.
But this query now takes about 18 seconds (!) to execute, instead of <1 second like before. Do you think, this could be accelerated?
See attached for the complete console output.
Best, Sebastian
Am 12.07.2018 um 13:03 schrieb Christian Grün:
Hi Sebastian,
This has been fixed. The background: In one of the optimizations of the "if" expression, identical branches are merged:
if(..expensive query..) then 1 else 1 → Optimized Query: 1
The full-text options were ignored in the equality check. A new snapshot is online.
Best, Christian
On Wed, Jul 11, 2018 at 1:22 PM Sebastian Zimmersebastian.zimmer@uni-koeln.de sebastian.zimmer@uni-koeln.de wrote:
Hi,
I have a query which is optimized in a curious way in BaseX 9.0.2 (yesterday's snapshot).
This is the original query:
xquery version "3.1"; declare namespace tei = "http://www.tei-c.org/ns/1.0" http://www.tei-c.org/ns/1.0; let $string := "string" let $fuzzy := false()
return ( collection('ZK')/tei:TEI[ if (false()) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ], collection('ZK')/tei:TEI[ if ($fuzzy) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ] )
And this is the optimized one (newlines inserted by me for better readability):
( ft:search("ZK", "string" using language 'English')/ancestor::tei:TEI[parent::document-node()], ft:search("ZK", "string" using fuzzy using language 'English')/ancestor::tei:TEI[parent::document-node()] )
I'm curious why the second search is using fuzzy, even though the variable $fuzzy is false. I presume that query optimization is independent of the data, so you won't need the data to reproduce. But if you do, I can provide it. A database with enabled full-text index is required obviously.
Best regards, Sebastian Zimmer
-- Sebastian Zimmersebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
-- Sebastian Zimmersebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de [image: CCeH Logo] http://cceh.uni-koeln.de
Cologne Center for eHumanities http://cceh.uni-koeln.de DH Center at the University of Cologne [image: Twitter Logo] https://twitter.com/CCeHum@CCeHum https://twitter.com/CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de [image: CCeH Logo] http://cceh.uni-koeln.de
Cologne Center for eHumanities http://cceh.uni-koeln.de DH Center at the University of Cologne [image: Twitter Logo] https://twitter.com/CCeHum@CCeHum https://twitter.com/CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de [image: CCeH Logo] http://cceh.uni-koeln.de
Cologne Center for eHumanities http://cceh.uni-koeln.de DH Center at the University of Cologne [image: Twitter Logo] https://twitter.com/CCeHum@CCeHum https://twitter.com/CCeHum
A new snapshot is available: http://files.basex.org/releases/latest. Thanks for reporting it, Christian
On Tue, Jul 24, 2018 at 12:37 PM Christian Grün christian.gruen@gmail.com wrote:
Thanks, good catch. I have opened an issue for that [1]. – Best, Christian
[1] https://github.com/BaseXdb/basex/issues/1597
On Tue, Jul 24, 2018 at 11:51 AM Sebastian Zimmer < sebastian.zimmer@uni-koeln.de> wrote:
Hi Christian,
this is the simplest query I could come up with:
xquery version "3.1"; let $fuzzy := false()
return ( collection('BIBL')/*:TEI[ if ($fuzzy) then () else (.[descendant::text() contains text {"string"} using fuzzy]) ] )
is optimized to:
(db:open-pre("BIBL", 0), ...)/*:TEI[descendant::text() contains text "string" using fuzzy using language 'English']
should be optimized to:
ft:search("BIBL", "string")/ancestor::*:TEI
Thanks, Sebastian
Am 24.07.2018 um 11:23 schrieb Christian Grün:
Hi Sebastian,
I guess the index was only rewritten for index access in BaseX 8 because the two branches of the if/then/else expression was simplified incorrectly and replaced with one of the branches. To simplify debugging, could you please simplify your query even more and drop all superfluous expressions that do not relate to this bug?
Thanks in advance, Christian
On Tue, Jul 24, 2018 at 11:16 AM Sebastian Zimmer < sebastian.zimmer@uni-koeln.de> wrote:
Hi again,
I have compared the output of the latest BaseX 9.1 with 8.6.7 and it looks like a regression.
Whereas in 8.6.7, the query ist optimized two times with ft:search, in 9.1 only one time.
See attached for both outputs.
Best, Sebastian
Am 23.07.2018 um 11:55 schrieb Sebastian Zimmer:
The full-text index of the database is enabled and the compiling info section of the query states
- apply full-text index for { $string_0 } using language 'English'
The optimized query looks to me as if the index is applied only once via ft:search, but not in both cases.
Best, Sebastian
Am 23.07.2018 um 11:47 schrieb Christian Grün:
Hi Sebastian,
Did you check in the Info View panel if the index is applied? If no, you might try something as follows:
if ($fuzzy) then ( collection('ZK')/tei:TEI[... using fuzzy]) ) else ( collection('ZK')/tei:TEI[...]) )
Usually, if full-text options are dynamic, I tend to use ft:search [1].
Best, Christian
[1] http://docs.basex.org/wiki/Full-Text_Module#ft:search
On Mon, Jul 23, 2018 at 11:41 AM Sebastian Zimmersebastian.zimmer@uni-koeln.de sebastian.zimmer@uni-koeln.de wrote:
Hi Christian,
thanks for the fix, the result is correct now.
But this query now takes about 18 seconds (!) to execute, instead of <1 second like before. Do you think, this could be accelerated?
See attached for the complete console output.
Best, Sebastian
Am 12.07.2018 um 13:03 schrieb Christian Grün:
Hi Sebastian,
This has been fixed. The background: In one of the optimizations of the "if" expression, identical branches are merged:
if(..expensive query..) then 1 else 1 → Optimized Query: 1
The full-text options were ignored in the equality check. A new snapshot is online.
Best, Christian
On Wed, Jul 11, 2018 at 1:22 PM Sebastian Zimmersebastian.zimmer@uni-koeln.de sebastian.zimmer@uni-koeln.de wrote:
Hi,
I have a query which is optimized in a curious way in BaseX 9.0.2 (yesterday's snapshot).
This is the original query:
xquery version "3.1"; declare namespace tei = "http://www.tei-c.org/ns/1.0" http://www.tei-c.org/ns/1.0; let $string := "string" let $fuzzy := false()
return ( collection('ZK')/tei:TEI[ if (false()) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ], collection('ZK')/tei:TEI[ if ($fuzzy) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ] )
And this is the optimized one (newlines inserted by me for better readability):
( ft:search("ZK", "string" using language 'English')/ancestor::tei:TEI[parent::document-node()], ft:search("ZK", "string" using fuzzy using language 'English')/ancestor::tei:TEI[parent::document-node()] )
I'm curious why the second search is using fuzzy, even though the variable $fuzzy is false. I presume that query optimization is independent of the data, so you won't need the data to reproduce. But if you do, I can provide it. A database with enabled full-text index is required obviously.
Best regards, Sebastian Zimmer
-- Sebastian Zimmersebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
-- Sebastian Zimmersebastian.zimmer@uni-koeln.de
Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de [image: CCeH Logo] http://cceh.uni-koeln.de
Cologne Center for eHumanities http://cceh.uni-koeln.de DH Center at the University of Cologne [image: Twitter Logo] https://twitter.com/CCeHum@CCeHum https://twitter.com/CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de [image: CCeH Logo] http://cceh.uni-koeln.de
Cologne Center for eHumanities http://cceh.uni-koeln.de DH Center at the University of Cologne [image: Twitter Logo] https://twitter.com/CCeHum@CCeHum https://twitter.com/CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de [image: CCeH Logo] http://cceh.uni-koeln.de
Cologne Center for eHumanities http://cceh.uni-koeln.de DH Center at the University of Cologne [image: Twitter Logo] https://twitter.com/CCeHum@CCeHum https://twitter.com/CCeHum
Thanks for the fix. Works smoothly now.
Best, Sebastian
Am 27.07.2018 um 12:56 schrieb Christian Grün:
A new snapshot is available: http://files.basex.org/releases/latest. Thanks for reporting it, Christian
On Tue, Jul 24, 2018 at 12:37 PM Christian Grün <christian.gruen@gmail.com mailto:christian.gruen@gmail.com> wrote:
Thanks, good catch. I have opened an issue for that [1]. – Best, Christian [1] https://github.com/BaseXdb/basex/issues/1597 On Tue, Jul 24, 2018 at 11:51 AM Sebastian Zimmer <sebastian.zimmer@uni-koeln.de <mailto:sebastian.zimmer@uni-koeln.de>> wrote: Hi Christian, this is the simplest query I could come up with: xquery version "3.1"; let $fuzzy := false() return ( collection('BIBL')/*:TEI[ if ($fuzzy) then () else (.[descendant::text() contains text {"string"} using fuzzy]) ] ) is optimized to: (db:open-pre("BIBL", 0), ...)/*:TEI[descendant::text() contains text "string" using fuzzy using language 'English'] should be optimized to: ft:search("BIBL", "string")/ancestor::*:TEI Thanks, Sebastian Am 24.07.2018 um 11:23 schrieb Christian Grün:
Hi Sebastian, I guess the index was only rewritten for index access in BaseX 8 because the two branches of the if/then/else expression was simplified incorrectly and replaced with one of the branches. To simplify debugging, could you please simplify your query even more and drop all superfluous expressions that do not relate to this bug? Thanks in advance, Christian On Tue, Jul 24, 2018 at 11:16 AM Sebastian Zimmer <sebastian.zimmer@uni-koeln.de <mailto:sebastian.zimmer@uni-koeln.de>> wrote: Hi again, I have compared the output of the latest BaseX 9.1 with 8.6.7 and it looks like a regression. Whereas in 8.6.7, the query ist optimized two times with ft:search, in 9.1 only one time. See attached for both outputs. Best, Sebastian Am 23.07.2018 um 11:55 schrieb Sebastian Zimmer:
The full-text index of the database is enabled and the compiling info section of the query states - apply full-text index for { $string_0 } using language 'English' The optimized query looks to me as if the index is applied only once via ft:search, but not in both cases. Best, Sebastian Am 23.07.2018 um 11:47 schrieb Christian Grün:
Hi Sebastian, Did you check in the Info View panel if the index is applied? If no, you might try something as follows: if ($fuzzy) then ( collection('ZK')/tei:TEI[... using fuzzy]) ) else ( collection('ZK')/tei:TEI[...]) ) Usually, if full-text options are dynamic, I tend to use ft:search [1]. Best, Christian [1]http://docs.basex.org/wiki/Full-Text_Module#ft:search On Mon, Jul 23, 2018 at 11:41 AM Sebastian Zimmer <sebastian.zimmer@uni-koeln.de> <mailto:sebastian.zimmer@uni-koeln.de> wrote:
Hi Christian, thanks for the fix, the result is correct now. But this query now takes about 18 seconds (!) to execute, instead of <1 second like before. Do you think, this could be accelerated? See attached for the complete console output. Best, Sebastian Am 12.07.2018 um 13:03 schrieb Christian Grün: Hi Sebastian, This has been fixed. The background: In one of the optimizations of the "if" expression, identical branches are merged: if(..expensive query..) then 1 else 1 → Optimized Query: 1 The full-text options were ignored in the equality check. A new snapshot is online. Best, Christian On Wed, Jul 11, 2018 at 1:22 PM Sebastian Zimmer <sebastian.zimmer@uni-koeln.de> <mailto:sebastian.zimmer@uni-koeln.de> wrote: Hi, I have a query which is optimized in a curious way in BaseX 9.0.2 (yesterday's snapshot). This is the original query: xquery version "3.1"; declare namespace tei ="http://www.tei-c.org/ns/1.0" <http://www.tei-c.org/ns/1.0>; let $string := "string" let $fuzzy := false() return ( collection('ZK')/tei:TEI[ if (false()) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ], collection('ZK')/tei:TEI[ if ($fuzzy) then (.[descendant::text() contains text {$string} using fuzzy]) else (.[descendant::text() contains text {$string}]) ] ) And this is the optimized one (newlines inserted by me for better readability): ( ft:search("ZK", "string" using language 'English')/ancestor::tei:TEI[parent::document-node()], ft:search("ZK", "string" using fuzzy using language 'English')/ancestor::tei:TEI[parent::document-node()] ) I'm curious why the second search is using fuzzy, even though the variable $fuzzy is false. I presume that query optimization is independent of the data, so you won't need the data to reproduce. But if you do, I can provide it. A database with enabled full-text index is required obviously. Best regards, Sebastian Zimmer -- Sebastian Zimmer sebastian.zimmer@uni-koeln.de <mailto:sebastian.zimmer@uni-koeln.de> Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum -- Sebastian Zimmer sebastian.zimmer@uni-koeln.de <mailto:sebastian.zimmer@uni-koeln.de> Cologne Center for eHumanities DH Center at the University of Cologne @CCeHum
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de <mailto:sebastian.zimmer@uni-koeln.de> CCeH Logo <http://cceh.uni-koeln.de> Cologne Center for eHumanities <http://cceh.uni-koeln.de> DH Center at the University of Cologne Twitter Logo <https://twitter.com/CCeHum>@CCeHum <https://twitter.com/CCeHum>
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de <mailto:sebastian.zimmer@uni-koeln.de> CCeH Logo <http://cceh.uni-koeln.de> Cologne Center for eHumanities <http://cceh.uni-koeln.de> DH Center at the University of Cologne Twitter Logo <https://twitter.com/CCeHum>@CCeHum <https://twitter.com/CCeHum>
-- Sebastian Zimmer sebastian.zimmer@uni-koeln.de <mailto:sebastian.zimmer@uni-koeln.de> CCeH Logo <http://cceh.uni-koeln.de> Cologne Center for eHumanities <http://cceh.uni-koeln.de> DH Center at the University of Cologne Twitter Logo <https://twitter.com/CCeHum>@CCeHum <https://twitter.com/CCeHum>
basex-talk@mailman.uni-konstanz.de