Hello again,
I notice in the current sources for QueryProcessor.java that executing a query appears to be a three-stage process: parse, compile, and execute. Is it now possible to preparse and/or precompile a query before using it? If so, this raises a couple important questions:
- Does executing the query "destroy" it (will subsequent executions return the same results)? - What can be done to the query after parsing or compiling? Can I rerun a query after changing the context? What about after adjusting the variable bindings? I notice that doing either of these in QueryProcessor does not reset the parse or compile flag...
The last time I looked at this, neither of the operations above were possible making preparsing and precompiling kind of pointless for embedded use. If this works now, it's very exciting and could save a lot of cycles in cases where a complex query needs to be repeatedly executed against different contexts or with different variables.
Thanks,
Dave
Hi Dave,
this is still work in progress. One (of numerous other) challenge(s) is that the compilation of a particular query depends the availability of index structures and statistics of a particular database. It would be more than reasonable, though, to at least bundle all static/logical compilation steps that do not depend on the existing data (as an example, index access rewritings would have to be moved into an extra post-processing compilation step). Precompilation will even get more interesting if our module library will be growing.
Hope this helps (a little), Christian ___________________________
On Tue, Dec 20, 2011 at 10:35 PM, Dave Glick dglick@dracorp.com wrote:
Hello again,
I notice in the current sources for QueryProcessor.java that executing a query appears to be a three-stage process: parse, compile, and execute. Is it now possible to preparse and/or precompile a query before using it? If so, this raises a couple important questions:
- Does executing the query "destroy" it (will subsequent executions return the same results)?
- What can be done to the query after parsing or compiling? Can I rerun a query after changing the context? What about after adjusting the variable bindings? I notice that doing either of these in QueryProcessor does not reset the parse or compile flag...
The last time I looked at this, neither of the operations above were possible making preparsing and precompiling kind of pointless for embedded use. If this works now, it's very exciting and could save a lot of cycles in cases where a complex query needs to be repeatedly executed against different contexts or with different variables.
Thanks,
Dave
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Christian,
Thanks - it does help. I was mainly wondering if it was safe to create a QueryProcessor instance and then repeatedly call execute() after making changes to context and/or variables without having to call parse() and/or compile() again, and it sounds like the answer is no (at least not yet).
Dave
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, December 20, 2011 4:59 PM To: Dave Glick Cc: BaseX Subject: Re: [basex-talk] Precompiling Queries
Hi Dave,
this is still work in progress. One (of numerous other) challenge(s) is that the compilation of a particular query depends the availability of index structures and statistics of a particular database. It would be more than reasonable, though, to at least bundle all static/logical compilation steps that do not depend on the existing data (as an example, index access rewritings would have to be moved into an extra post-processing compilation step). Precompilation will even get more interesting if our module library will be growing.
Hope this helps (a little), Christian ___________________________
On Tue, Dec 20, 2011 at 10:35 PM, Dave Glick dglick@dracorp.com wrote:
Hello again,
I notice in the current sources for QueryProcessor.java that executing a query appears to be a three-stage process: parse, compile, and execute. Is it now possible to preparse and/or precompile a query before using it? If so, this raises a couple important questions:
- Does executing the query "destroy" it (will subsequent executions return the same results)?
- What can be done to the query after parsing or compiling? Can I rerun a query after changing the context? What about after adjusting the variable bindings? I notice that doing either of these in QueryProcessor does not reset the parse or compile flag...
The last time I looked at this, neither of the operations above were possible making preparsing and precompiling kind of pointless for embedded use. If this works now, it's very exciting and could save a lot of cycles in cases where a complex query needs to be repeatedly executed against different contexts or with different variables.
Thanks,
Dave
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
After looking at this a little more, I have to wonder if the QueryProcessor class is safe for external use. It contains private flags to indicate whether the QueryContext has been parsed or compiled, but these flags don't get reset on events that might invalidate the previous compilation such as changing the context or updating the database - the only case I can find where they do get reset is if the actual expression changes. Because the parse() and compile() methods in the QueryProcessor class check the corresponding flag, it's actually impossible to re-compile the QueryContext from the QueryProcessor (without changing the expression).
A problem arises because you can use the QueryProcessor class to adjust aspects of the query such as the initial context or variable bindings. Based on what you said below and my understanding of it, this sounds like a bug - or at the very least an API problem because it means it's very easy to misuse the QueryProcessor class and evaluate queries without re-parsing or re-compiling when it's actually needed. Digging deeper, it appears as though the methods that allow these changes aren't even really used within BaseX - QueryProcessor.context() is never called and QueryProcessor.bind() is only used from the QueryListener in the server. This suggests that these methods were added in an attempt to create a public API and perhaps aren't tested that much or haven't been revisited.
My suggestion would be one of the following: - Verify that under all conditions calling QueryProcessor.context(), QueryProcessor.bind(), etc. on a previously evaluated query and then evaluating again does not cause problems (though again, based on your explanation it sounds they should). - Remove or internalize the QueryProcessor.context(), QueryProcessor.bind(), etc. methods so that API users can't put their QueryProcessor into an invalid state. - Cause QueryProcessor.context(), QueryProcessor.bind(), etc. to reset the parsed and/or compiled flags so that re-evaluating the query also re-parses and re-compiles it. - Though this still won't address the question of what happens if the database itself changes between evaluations - could that also invalidate the previous compilation? If so, maybe also cache the database timestamp and compare on each evaluation?.
I hope that all made sense. I suspect I'm one of the few users actually interfacing directly with the code so hopefully this kind of feedback is helpful.
Thanks!
Dave
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: Tuesday, December 20, 2011 4:59 PM To: Dave Glick Cc: BaseX Subject: Re: [basex-talk] Precompiling Queries
Hi Dave,
this is still work in progress. One (of numerous other) challenge(s) is that the compilation of a particular query depends the availability of index structures and statistics of a particular database. It would be more than reasonable, though, to at least bundle all static/logical compilation steps that do not depend on the existing data (as an example, index access rewritings would have to be moved into an extra post-processing compilation step). Precompilation will even get more interesting if our module library will be growing.
Hope this helps (a little), Christian ___________________________
On Tue, Dec 20, 2011 at 10:35 PM, Dave Glick dglick@dracorp.com wrote:
Hello again,
I notice in the current sources for QueryProcessor.java that executing a query appears to be a three-stage process: parse, compile, and execute. Is it now possible to preparse and/or precompile a query before using it? If so, this raises a couple important questions:
- Does executing the query "destroy" it (will subsequent executions return the same results)?
- What can be done to the query after parsing or compiling? Can I rerun a query after changing the context? What about after adjusting the variable bindings? I notice that doing either of these in QueryProcessor does not reset the parse or compile flag...
The last time I looked at this, neither of the operations above were possible making preparsing and precompiling kind of pointless for embedded use. If this works now, it's very exciting and could save a lot of cycles in cases where a complex query needs to be repeatedly executed against different contexts or with different variables.
Thanks,
Dave
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Dave,
I completely agree that the QueryProcessor class has not been tailored (..yet?..) to provide a clean API for external users. Your feedback is indeed very helpful. Some general remarks:
After looking at this a little more, I have to wonder if the QueryProcessor class is safe for external use. It contains private flags to indicate whether the QueryContext has been parsed or compiled, but these flags don't get reset on events that might invalidate the previous compilation such as changing the context or updating the database - the only case I can find where they do get reset is if the actual expression changes.
Currently, those expressions are mainly used to prevent duplicate parsing/compilations of a query. They are only invalidated by the query() method, which is only called by the XQJ API. In this case, it would be more correct to really invalidate the query context instead of just initializing the two flags. The query() method might disappear as soon as we have an external XQJ implementation that will render ours obsolete.
Because the parse() and compile() methods in the QueryProcessor class check the corresponding flag, it's actually impossible to re-compile the QueryContext from the QueryProcessor (without changing the expression).
The main reason why it's currently impossible to recompile an expression is that once an expression is compiled, it won't be possible to reset the expression to its initial state. For example, a query that has been rewritten to using an index cannot be reset to its initial expression tree. One advantage of the current approach is that the parsing/compilation steps are very fast; they'll surely get slower if we decide to create an additional compiled tree representation and preserve the original tree. Still, a clean separation would probably the way to go in future.
Digging deeper, it appears as though the methods that allow these changes aren't even really used within BaseX - QueryProcessor.context() is never called and QueryProcessor.bind() is only used from the QueryListener in the server. This suggests that these methods were added in an attempt to create a public API and perhaps aren't tested that much or haven't been revisited.
The context() method is currently used by the XQJ and the W3T APIs (located in the basex-api and basex-tests repositories); bind() is used at some other places. It's true that those methods are mainly used by the APIs (indeed I believe that BaseX doesn't have *any* unused methods except for some that are defined in the Performance class..).
My suggestion would be one of the following:
- Verify that under all conditions calling QueryProcessor.context(), QueryProcessor.bind(), etc. on a previously evaluated query and then evaluating again does not cause problems (though again, based on your explanation it sounds they should).
As indicated above, I think the main challenge here is to introduce two separate representations for parsed and compiled expressions. The related question is: how should the compiled representation look like? Should it include alternative rewritings for index access, which can be chosen at runtime if appropriate? Sounds exciting, but ambitious as well.
Once more.. Hope this helps ;) Christian
basex-talk@mailman.uni-konstanz.de