Hi Chris
I did look at the Query Info in the Gui and it seems you are correct. The complex query I created is not utilizing the full text index.
Details here: https://gist.github.com/4438473
The gist contains query info for the complex search that I created and for a simple one that just executes the 'contains text' clause.
The compile statement for the simple query includes the following line: "applying full-text index" The complex query does not…
So what to do now? How do I figure out why basex is not using the full text index as the initial step in limiting the results of my search?
Thanks
David
From: "basex-talk-request@mailman.uni-konstanz.demailto:basex-talk-request@mailman.uni-konstanz.de" <basex-talk-request@mailman.uni-konstanz.demailto:basex-talk-request@mailman.uni-konstanz.de> Reply-To: "basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de> Date: Tue, 1 Jan 2013 06:00:02 -0500 To: "basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de> Subject: BaseX-Talk Digest, Vol 37, Issue 1
Message: 1 Date: Mon, 31 Dec 2012 20:56:01 +0100 From: Christian Gr?n <christian.gruen@gmail.commailto:christian.gruen@gmail.com> To: David Stuebe <DStuebe@asascience.commailto:DStuebe@asascience.com> Cc: Kyle Wilcox <KWilcox@asascience.commailto:KWilcox@asascience.com>, "basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de> Subject: Re: [basex-talk] Initial xquery for full text and geospatial search of ISO 19115 docs Message-ID: <CAP94bnPneuWguMGL-k2ZMw6gCD4WQNc82KBR5es8h+BrZ1fs2Q@mail.gmail.commailto:CAP94bnPneuWguMGL-k2ZMw6gCD4WQNc82KBR5es8h+BrZ1fs2Q@mail.gmail.com> Content-Type: text/plain; charset=windows-1252
Hi David,
thanks for the insight into your project. It may be that the full-text index is not utilized by your XQuery expression. Did you have a look at the query info (e.g. via GUI, InfoView, or -V on command line)? Did you manage to write simpler queries that are processed faster?
Best, Christian ___________________________
On Mon, Dec 31, 2012 at 7:31 PM, David Stuebe <DStuebe@asascience.commailto:DStuebe@asascience.com> wrote:
Hi Basex Folks
I have written a simple minded xquery script which can be used in a post to search ISO 19115 metadata documents. As I am a newbie to xquery and basex I expect that is much that I could do to improve performance, but currently searches take up to 15 seconds. The server hardware is not blazing fast and we are running inside tomcat? but that is certainly not acceptable for a database which is still quiet small. Initially it was fine, responding in less than a second, but when we expanded from less than 100 docs to about 35000 the time grew at least linearly. I am hoping this is due to my poor xquery programming or a setting on the server.
Here is the query we are running using a post request with the declared variables at the bottom filled in by the UI: https://github.com/asascience-open/glos_catalog/blob/master/queries/full_sea...
Here are some of the ISO xml documents that we have in our basex DB: https://github.com/asascience-open/glos_catalog/tree/master/ISOs
It is used by this site: http://explorer.glos.us/ To provide geospatial metadata search. Fill in a text value like "water" or "temperature" in the search box in the top right?
The server is here: http://64.9.200.113:8080/BaseX73/ User Name: user Password: glos ACL: read only
I am hoping it is as simple as a setting on the basex server. Here is the current server info:
info database Database Properties Name: glos Size: 422 MB Nodes: 17045388 Documents: 34922 Binaries: 0 Timestamp: 27.12.2012 00:58:03
Resource Properties Timestamp: 22.12.2012 01:43:05 Encoding: UTF-8 Whitespace Chopping: ON
Indexes Up-to-date: true Text Index: ON Attribute Index: ON Full-Text Index: ON
The index info is available here: https://github.com/asascience-open/glos_catalog/blob/master/queries/glos_ind...
David Stuebe
Scientist & Software Engineer ? RPS ASA
55 Village Square Drive South Kingstown, RI 02879-8248
Tel: +1 (401) 789-6224
Email: David.Stuebe@rpsgroup.commailto:David.Stuebe@rpsgroup.com www: asascience.com | rpsgroup.com
A member of the RPS Group plc
_______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.demailto:BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi David,
due to the complexity of XQuery, it’s often a big challenge for the query optimizer to find out if an index can be used or not. This is the reason why some XML database don’t even try to rewrite queries for index access.
What you can always do is: directly address the available index structures via the Full-Text and Index modules [1,2]. This will give you the guarantee that the index is indeed utilized.
I'm sure there are also ways to rewrite your existing query in order to ensure index access, but I’d first need some more time to get into this. As a general note, you can help the optimizer by directly specifying the addressed database in your query. The following two examples may illustrate this. Query A und B will be optimized for index access, while Query C won’t (well, at least not in BaseX ≤ 7.5):
Query A db:open("db")//*[text() contains text { "A","B" }]
Query B let $terms := ("A", "B") for $x in db:open("db")//* where $x/text() contains text { $terms } return $x
Query C declare function local:x($db, $terms) { db:open($db)//*[text() contains text { $terms }] }; let $terms := ("A", "B") let $db := "db" return local:x($db, $terms)
Best, Christian
[1] http://docs.basex.org/wiki/Full-Text_Module [2] http://docs.basex.org/wiki/Index_Module ___________________________
On Wed, Jan 2, 2013 at 10:59 PM, David Stuebe DStuebe@asascience.com wrote:
Hi Chris
I did look at the Query Info in the Gui and it seems you are correct. The complex query I created is not utilizing the full text index.
Details here: https://gist.github.com/4438473
The gist contains query info for the complex search that I created and for a simple one that just executes the 'contains text' clause.
The compile statement for the simple query includes the following line: "applying full-text index" The complex query does not…
So what to do now? How do I figure out why basex is not using the full text index as the initial step in limiting the results of my search?
Thanks
David
From: "basex-talk-request@mailman.uni-konstanz.de" basex-talk-request@mailman.uni-konstanz.de Reply-To: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de Date: Tue, 1 Jan 2013 06:00:02 -0500 To: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de Subject: BaseX-Talk Digest, Vol 37, Issue 1
Message: 1 Date: Mon, 31 Dec 2012 20:56:01 +0100 From: Christian Gr?n christian.gruen@gmail.com To: David Stuebe DStuebe@asascience.com Cc: Kyle Wilcox KWilcox@asascience.com, "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Initial xquery for full text and geospatial search of ISO 19115 docs Message-ID: CAP94bnPneuWguMGL-k2ZMw6gCD4WQNc82KBR5es8h+BrZ1fs2Q@mail.gmail.com Content-Type: text/plain; charset=windows-1252
Hi David,
thanks for the insight into your project. It may be that the full-text index is not utilized by your XQuery expression. Did you have a look at the query info (e.g. via GUI, InfoView, or -V on command line)? Did you manage to write simpler queries that are processed faster?
Best, Christian ___________________________
On Mon, Dec 31, 2012 at 7:31 PM, David Stuebe DStuebe@asascience.com wrote:
Hi Basex Folks
I have written a simple minded xquery script which can be used in a post to search ISO 19115 metadata documents. As I am a newbie to xquery and basex I expect that is much that I could do to improve performance, but currently searches take up to 15 seconds. The server hardware is not blazing fast and we are running inside tomcat? but that is certainly not acceptable for a database which is still quiet small. Initially it was fine, responding in less than a second, but when we expanded from less than 100 docs to about 35000 the time grew at least linearly. I am hoping this is due to my poor xquery programming or a setting on the server.
Here is the query we are running using a post request with the declared variables at the bottom filled in by the UI: https://github.com/asascience-open/glos_catalog/blob/master/queries/full_sea...
Here are some of the ISO xml documents that we have in our basex DB: https://github.com/asascience-open/glos_catalog/tree/master/ISOs
It is used by this site: http://explorer.glos.us/ To provide geospatial metadata search. Fill in a text value like "water" or "temperature" in the search box in the top right?
The server is here: http://64.9.200.113:8080/BaseX73/ User Name: user Password: glos ACL: read only
I am hoping it is as simple as a setting on the basex server. Here is the current server info:
info database Database Properties Name: glos Size: 422 MB Nodes: 17045388 Documents: 34922 Binaries: 0 Timestamp: 27.12.2012 00:58:03
Resource Properties Timestamp: 22.12.2012 01:43:05 Encoding: UTF-8 Whitespace Chopping: ON
Indexes Up-to-date: true Text Index: ON Attribute Index: ON Full-Text Index: ON
The index info is available here: https://github.com/asascience-open/glos_catalog/blob/master/queries/glos_ind...
David Stuebe
Scientist & Software Engineer ? RPS ASA
55 Village Square Drive South Kingstown, RI 02879-8248
Tel: +1 (401) 789-6224
Email: David.Stuebe@rpsgroup.com www: asascience.com | rpsgroup.com
A member of the RPS Group plc
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Check out our Wiki article on full-text processing to find out more:
http://docs.basex.org/wiki/Full-Text ___________________________
On Wed, Jan 2, 2013 at 11:17 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi David,
due to the complexity of XQuery, it’s often a big challenge for the query optimizer to find out if an index can be used or not. This is the reason why some XML database don’t even try to rewrite queries for index access.
What you can always do is: directly address the available index structures via the Full-Text and Index modules [1,2]. This will give you the guarantee that the index is indeed utilized.
I'm sure there are also ways to rewrite your existing query in order to ensure index access, but I’d first need some more time to get into this. As a general note, you can help the optimizer by directly specifying the addressed database in your query. The following two examples may illustrate this. Query A und B will be optimized for index access, while Query C won’t (well, at least not in BaseX ≤ 7.5):
Query A db:open("db")//*[text() contains text { "A","B" }]
Query B let $terms := ("A", "B") for $x in db:open("db")//* where $x/text() contains text { $terms } return $x
Query C declare function local:x($db, $terms) { db:open($db)//*[text() contains text { $terms }] }; let $terms := ("A", "B") let $db := "db" return local:x($db, $terms)
Best, Christian
[1] http://docs.basex.org/wiki/Full-Text_Module [2] http://docs.basex.org/wiki/Index_Module ___________________________
On Wed, Jan 2, 2013 at 10:59 PM, David Stuebe DStuebe@asascience.com wrote:
Hi Chris
I did look at the Query Info in the Gui and it seems you are correct. The complex query I created is not utilizing the full text index.
Details here: https://gist.github.com/4438473
The gist contains query info for the complex search that I created and for a simple one that just executes the 'contains text' clause.
The compile statement for the simple query includes the following line: "applying full-text index" The complex query does not…
So what to do now? How do I figure out why basex is not using the full text index as the initial step in limiting the results of my search?
Thanks
David
From: "basex-talk-request@mailman.uni-konstanz.de" basex-talk-request@mailman.uni-konstanz.de Reply-To: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de Date: Tue, 1 Jan 2013 06:00:02 -0500 To: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de Subject: BaseX-Talk Digest, Vol 37, Issue 1
Message: 1 Date: Mon, 31 Dec 2012 20:56:01 +0100 From: Christian Gr?n christian.gruen@gmail.com To: David Stuebe DStuebe@asascience.com Cc: Kyle Wilcox KWilcox@asascience.com, "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Initial xquery for full text and geospatial search of ISO 19115 docs Message-ID: CAP94bnPneuWguMGL-k2ZMw6gCD4WQNc82KBR5es8h+BrZ1fs2Q@mail.gmail.com Content-Type: text/plain; charset=windows-1252
Hi David,
thanks for the insight into your project. It may be that the full-text index is not utilized by your XQuery expression. Did you have a look at the query info (e.g. via GUI, InfoView, or -V on command line)? Did you manage to write simpler queries that are processed faster?
Best, Christian ___________________________
On Mon, Dec 31, 2012 at 7:31 PM, David Stuebe DStuebe@asascience.com wrote:
Hi Basex Folks
I have written a simple minded xquery script which can be used in a post to search ISO 19115 metadata documents. As I am a newbie to xquery and basex I expect that is much that I could do to improve performance, but currently searches take up to 15 seconds. The server hardware is not blazing fast and we are running inside tomcat? but that is certainly not acceptable for a database which is still quiet small. Initially it was fine, responding in less than a second, but when we expanded from less than 100 docs to about 35000 the time grew at least linearly. I am hoping this is due to my poor xquery programming or a setting on the server.
Here is the query we are running using a post request with the declared variables at the bottom filled in by the UI: https://github.com/asascience-open/glos_catalog/blob/master/queries/full_sea...
Here are some of the ISO xml documents that we have in our basex DB: https://github.com/asascience-open/glos_catalog/tree/master/ISOs
It is used by this site: http://explorer.glos.us/ To provide geospatial metadata search. Fill in a text value like "water" or "temperature" in the search box in the top right?
The server is here: http://64.9.200.113:8080/BaseX73/ User Name: user Password: glos ACL: read only
I am hoping it is as simple as a setting on the basex server. Here is the current server info:
info database Database Properties Name: glos Size: 422 MB Nodes: 17045388 Documents: 34922 Binaries: 0 Timestamp: 27.12.2012 00:58:03
Resource Properties Timestamp: 22.12.2012 01:43:05 Encoding: UTF-8 Whitespace Chopping: ON
Indexes Up-to-date: true Text Index: ON Attribute Index: ON Full-Text Index: ON
The index info is available here: https://github.com/asascience-open/glos_catalog/blob/master/queries/glos_ind...
David Stuebe
Scientist & Software Engineer ? RPS ASA
55 Village Square Drive South Kingstown, RI 02879-8248
Tel: +1 (401) 789-6224
Email: David.Stuebe@rpsgroup.com www: asascience.com | rpsgroup.com
A member of the RPS Group plc
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de