Dear BaseX team, perhaps I hear an opinion about the following idea. (1) The function xquery:parse is already immensely useful as it allows to *validate* XPath expressions, as for example used in configuration data. (A practical example: the XPath expressions used in JMeter [1] test plans in order to extract message data and define test assertions.) (2) Given the XSDs of a Web Service (or other application), it is possible to determine all valid data paths (e.g. /a/b/c is valid, /a/b/C is not valid). (3) If there were a reliable way to map the output of xquery:parse to the implied data paths, then one could use xquery:parse to validate xpath expression not only for syntactic correctness (which is already very much!), but also for consistency with application XSDs. This would be extremly usefuly as XSDs evolve and a checking if configuration-based XPath expressions must be adapted is otherwise very difficult to achieve in any systematic way. (4) Looking at the xquery:parse output, it seems certainly feasible to write such a mapping (xquery:parse output => data paths); but the problem I see is that the format is not guaranteed to be stable, as it is no standard. (And it is probably not described.) If BaseX provided such a functionality as an additional extension function (e.g. xquery:data-paths($query as xs:string)) you would simply be - heros. Cheers,Hans-Jürgen
[1] http://jmeter.apache.org/ PS: Illustrative example. let $query := "/descendant::OutboundFlight[Status = 'Booked'][1]/FlightDeparture/DepartureTime"return xquery:parse($query)=><MainModule updating="false"> <QueryPlan compiled="false"> <CachedPath> <Root/> <IterPosStep axis="descendant" test="OutboundFlight"> <CmpG op="="> <CachedPath> <IterStep axis="child" test="Status"/> </CachedPath> <Str value="Booked" type="xs:string"/> </CmpG> <Int value="1" type="xs:integer"/> </IterPosStep> <IterStep axis="child" test="FlightDeparture"/> <IterStep axis="child" test="DepartureTime"/> </CachedPath> </QueryPlan> </MainModule>
let $query := "/descendant::OutboundFlight[FlightDeparture/DepartureDate][1]/FlightDeparture/DepartureDate"return xquery:data-paths($query)=>//OutboundFlight/FlightDeparture/DepartureTime,//OutboundFlight/Status
Hi Hans-Jürgen,
Sounds like an interesting idea. As the output of xquery:parse is XML, the xquery:data-paths function could probably be written in XQuery itself?
Best, Christian _________________________________
On Sat, Feb 20, 2016 at 12:24 AM, Hans-Juergen Rennau hrennau@yahoo.de wrote:
Dear BaseX team,
perhaps I hear an opinion about the following idea.
(1) The function xquery:parse is already immensely useful as it allows to *validate* XPath expressions, as for example used in configuration data. (A practical example: the XPath expressions used in JMeter [1] test plans in order to extract message data and define test assertions.)
(2) Given the XSDs of a Web Service (or other application), it is possible to determine all valid data paths (e.g. /a/b/c is valid, /a/b/C is not valid).
(3) If there were a reliable way to map the output of xquery:parse to the implied data paths, then one could use xquery:parse to validate xpath expression not only for syntactic correctness (which is already very much!), but also for consistency with application XSDs. This would be extremly usefuly as XSDs evolve and a checking if configuration-based XPath expressions must be adapted is otherwise very difficult to achieve in any systematic way.
(4) Looking at the xquery:parse output, it seems certainly feasible to write such a mapping (xquery:parse output => data paths); but the problem I see is that the format is not guaranteed to be stable, as it is no standard. (And it is probably not described.) If BaseX provided such a functionality as an additional extension function (e.g. xquery:data-paths($query as xs:string)) you would simply be - heros.
Cheers, Hans-Jürgen
PS: Illustrative example.
let $query := "/descendant::OutboundFlight[Status = 'Booked'][1]/FlightDeparture/DepartureTime" return xquery:parse($query) =>
<MainModule updating="false"> <QueryPlan compiled="false"> <CachedPath> <Root/> <IterPosStep axis="descendant" test="OutboundFlight"> <CmpG op="="> <CachedPath> <IterStep axis="child" test="Status"/> </CachedPath> <Str value="Booked" type="xs:string"/> </CmpG> <Int value="1" type="xs:integer"/> </IterPosStep> <IterStep axis="child" test="FlightDeparture"/> <IterStep axis="child" test="DepartureTime"/> </CachedPath> </QueryPlan> </MainModule>
let $query := "/descendant::OutboundFlight[FlightDeparture/DepartureDate][1]/FlightDeparture/DepartureDate" return xquery:data-paths($query) => //OutboundFlight/FlightDeparture/DepartureTime, //OutboundFlight/Status
Certainly,Christian! Therefore, in principle the functionality could be supplied by application code. Only, as I said, the problem with a user-supplied (XQuery) implementation of the function would be unpredictable changes of the input (= xquery:parse output) in future versions of BaseX, as well as the guessing made necessary by the lack of a description or schema of the input. Cheers,Hans-Jürgen
Christian Grün christian.gruen@gmail.com schrieb am 0:43 Samstag, 20.Februar 2016:
Hi Hans-Jürgen,
Sounds like an interesting idea. As the output of xquery:parse is XML, the xquery:data-paths function could probably be written in XQuery itself?
Best, Christian _________________________________
On Sat, Feb 20, 2016 at 12:24 AM, Hans-Juergen Rennau hrennau@yahoo.de wrote:
Dear BaseX team,
perhaps I hear an opinion about the following idea.
(1) The function xquery:parse is already immensely useful as it allows to *validate* XPath expressions, as for example used in configuration data. (A practical example: the XPath expressions used in JMeter [1] test plans in order to extract message data and define test assertions.)
(2) Given the XSDs of a Web Service (or other application), it is possible to determine all valid data paths (e.g. /a/b/c is valid, /a/b/C is not valid).
(3) If there were a reliable way to map the output of xquery:parse to the implied data paths, then one could use xquery:parse to validate xpath expression not only for syntactic correctness (which is already very much!), but also for consistency with application XSDs. This would be extremly usefuly as XSDs evolve and a checking if configuration-based XPath expressions must be adapted is otherwise very difficult to achieve in any systematic way.
(4) Looking at the xquery:parse output, it seems certainly feasible to write such a mapping (xquery:parse output => data paths); but the problem I see is that the format is not guaranteed to be stable, as it is no standard. (And it is probably not described.) If BaseX provided such a functionality as an additional extension function (e.g. xquery:data-paths($query as xs:string)) you would simply be - heros.
Cheers, Hans-Jürgen
PS: Illustrative example.
let $query := "/descendant::OutboundFlight[Status = 'Booked'][1]/FlightDeparture/DepartureTime" return xquery:parse($query) =>
<MainModule updating="false"> <QueryPlan compiled="false"> <CachedPath> <Root/> <IterPosStep axis="descendant" test="OutboundFlight"> <CmpG op="="> <CachedPath> <IterStep axis="child" test="Status"/> </CachedPath> <Str value="Booked" type="xs:string"/> </CmpG> <Int value="1" type="xs:integer"/> </IterPosStep> <IterStep axis="child" test="FlightDeparture"/> <IterStep axis="child" test="DepartureTime"/> </CachedPath> </QueryPlan> </MainModule>
let $query := "/descendant::OutboundFlight[FlightDeparture/DepartureDate][1]/FlightDeparture/DepartureDate" return xquery:data-paths($query) => //OutboundFlight/FlightDeparture/DepartureTime, //OutboundFlight/Status
Hi Hans-Jürgen,
One reason why I would favor to have an XQuery implementation first is because it would allow us to define the semantics behind this function. There are currently lots of questions that are unclear to me, so it’s mostly conceptual questions that would need to be solved before such a function could be realized.
I had a closer look at your examples:
let $query := "/descendant::OutboundFlight[Status = 'Booked'][1]/FlightDeparture/DepartureTime"
let $query := "/descendant::OutboundFlight[FlightDeparture/DepartureDate][1]/FlightDeparture/DepartureDate" return xquery:data-paths($query) => //OutboundFlight/FlightDeparture/DepartureTime, //OutboundFlight/Status
Probably the result of the second query refers to your first example, right?
To be strict, "/descendant::OutboundFlight" is not identical to "//OutboundFlight"; but it is equivalent. So I would have expected to get the following paths as output:
/descendant::OutboundFlight/child::FlightDeparture/child::DepartureTime, /descendant::OutboundFlight/child::Status
"//OF[1]" could lead to:
/descendant-or-self::node()/child::OF
Next, we would need to sort out if/how to handle axes other than descendant-or-self, descendant and child. Moreover, as XQuery allows us to have all kinds of expression nearly everywhere, sth. like "A[B = 'c']" or "A[B]/C" are rather special (albeit common) query patterns that could be written in many different ways. The most obvious patterns could probably nailed down and realized pretty quickly, but there would be a lot of potenzial for optimizing the query output and considering corner cases, and this is usually something that costs a lot of time.
As I assume that you have some specific patterns in mind that would be helpful for you, would you be interested in providing an initial solution in XQuery that we could adopt as Java function later on?
Christian
Thank you very much for your thoughts, Christian! First, yes: you are right, the second outcome referred to the first XPath expression: /descendant::OutboundFlight[Status = 'Booked'][1]/FlightDeparture/DepartureTime => //OutboundFlight/FlightDeparture/DepartureTime, //OutboundFlight/Status
But now to those conceptual problems. How about the following picture. (0) The goal: to map XQuery expressions to assertions about the implied structure (and possibly value constraints) of input data consumed by the expression; these implications might be "validated" against XML schemas describing the input data, enabling the detection of errors in the expression. (1) As a first iteration one might map the expression to all sequences of axis steps (considering all axes); the obtained path fragments might be absolute or relative paths, depending on the first axis step (absolute/relative). Example of an absolute path: /a//b[c]/ancestor::d =>root()/child::a/descendent-or-self::node/child::b/child::c,root()/child::a/descendent-or-self::node/child::b/ancestor::d Examples of a relative path: $doc/a//b[c]/ancestor::d => child::a/descendent-or-self::node/child::b/child::c, child::a/descendent-or-self::node/child::b/ancestor::d doc("dictionary.xml")/a//b[c]/ancestor::d => child::a/descendent-or-self::node/child::b/child::c, child::a/descendent-or-self::node/child::b/ancestor::d (2) Alas, the described navigation paths need not necessarily apply to a single document type. For example:/a/b[d = doc("dictionary.xml")/x/y => root()/child::a/child::b/child::d, child::x/child::y ~ ~ ~ In principle, one could advance further and further, adding iteratively more cabilities of inference. For example, at a second stage the "path extractor function" might perform some backtracking of variable references, enabling the following inference:let $a := /xreturn $a/y=>root()/child::x/child::y But the task would be open-ended, perhaps even exceeding the complexity of an XQuery processor - the task of resolving XQuery expressions to a set of inferences, rather than to the expression value. In fact, I think that such a "query => inferences" mapper would be of great usefulness - imagine how much quality control would be enabled! And any single error detected (including those detected by tools which do not claim to achieve any completeness of checking) means an instance of real *value*. But the amount and complex nature of the work required would probably scare off anybody, appearing unproportional.
So - is the first idea, at second thought, worthless because leading towards sheerly unlimited amounts of effort? Or does it make sense to define strict limits of the usefulness, to be attained with very limited effort. For example: (a) extracting all axis step sequences, as stated above(b) extracting basic information about data access operations (initial context item accessed y/n?; doc() calls with an argument resolvable to a literal URI; doc() calls with unresolved arguments). In rather typical cases of XPath and XQuery usage, only a single data source is accessed and already this modest set of information would suffice to assert that all axis step sequences must be valid against the document type of the (single) data source. In other words, in many scenarios, the information could be used to detect query errors.
What is attractive about this approach is the unlimited extensibility. When the ambition is given up to provide *all* possible inferences and when any inference is understood as a chance to catch errors, than the approach to start with something and add along the way - makes some sense. The approach is not equivalent, but related to the concept of a "schema-aware XQuery processor", or am I wrong? My feeling is that all XQuery implementors have turned away from that possibility due to a disproportion of effort and benefit. Nevertheless, it may be very useful to provide such a "data path extractor" (= minimal mapping of "query => inferences").
What do you think?
I would be prepared to embark upon an XQuery implementation of a data path extractor, provided that you do not come to the conclusion that it would be of very little. Cheers,Hans-Jürgen
Christian Grün christian.gruen@gmail.com schrieb am 8:52 Samstag, 20.Februar 2016:
Hi Hans-Jürgen,
One reason why I would favor to have an XQuery implementation first is because it would allow us to define the semantics behind this function. There are currently lots of questions that are unclear to me, so it’s mostly conceptual questions that would need to be solved before such a function could be realized.
I had a closer look at your examples:
let $query := "/descendant::OutboundFlight[Status = 'Booked'][1]/FlightDeparture/DepartureTime"
let $query := "/descendant::OutboundFlight[FlightDeparture/DepartureDate][1]/FlightDeparture/DepartureDate" return xquery:data-paths($query) => //OutboundFlight/FlightDeparture/DepartureTime, //OutboundFlight/Status
Probably the result of the second query refers to your first example, right?
To be strict, "/descendant::OutboundFlight" is not identical to "//OutboundFlight"; but it is equivalent. So I would have expected to get the following paths as output:
/descendant::OutboundFlight/child::FlightDeparture/child::DepartureTime, /descendant::OutboundFlight/child::Status
"//OF[1]" could lead to:
/descendant-or-self::node()/child::OF
Next, we would need to sort out if/how to handle axes other than descendant-or-self, descendant and child. Moreover, as XQuery allows us to have all kinds of expression nearly everywhere, sth. like "A[B = 'c']" or "A[B]/C" are rather special (albeit common) query patterns that could be written in many different ways. The most obvious patterns could probably nailed down and realized pretty quickly, but there would be a lot of potenzial for optimizing the query output and considering corner cases, and this is usually something that costs a lot of time.
As I assume that you have some specific patterns in mind that would be helpful for you, would you be interested in providing an initial solution in XQuery that we could adopt as Java function later on?
Christian
Hi Hans-Jürgen,
I’ll start from the end of your mail:
I would be prepared to embark upon an XQuery implementation of a data path extractor, provided that you do not come to the conclusion that it would be of very little.
Awesome. The result will surely be interesting for others in the community as well.
The approach is not equivalent, but related to the concept of a "schema-aware XQuery processor", or am I wrong?
What a schema-aware processor mostly does is adding type information to the processed nodes, and this info needs to be handled at runtime. However, schema information can indeed be helpful at parse or compile time as well, at it allows for more optimizations. In BaseX, we use our database statistics and the name and path indexes for similar optimizations.
My feeling is that all XQuery implementors have turned away from that possibility due to a disproportion of effort and benefit.
I can’t speak for other implementations, but it would surely have cost us too much time to make BaseX schema-aware. Saxon does an excellent job at evaluating schema information. It might be worth checking out its query plans to get a feeling of what’s possible if schema info is available.
So - is the first idea, at second thought, worthless because leading towards sheerly unlimited amounts of effort?
Absolutely not ;) I would say that the value/merit of an idea has generally nothing to do with the effort related to making it happen.
let $a := /x return $a/y => root()/child::x/child::y
But the task would be open-ended, perhaps even exceeding the complexity of an XQuery processor - the task of resolving XQuery expressions to a set of inferences, rather than to the expression value.
Some optimizations like this are already taking place in BaseX. If you run the query above for a document that does not contain x or y elements, the resulting query plan will be an empty sequence. However, what we currently don’t do in BaseX is to pass on path information to variables. For example, look at the following input and queries:
* Input: <x><y/></x> * Query 1: xquery:parse('/x/x', map { 'compile': true(), 'plan': true() }) * Query 2: xquery:parse('let $x := /x return $x/x', map { 'compile': true(), 'plan': true() })
Query 1 will currently be rewritten to an empty sequence, but Query 2 won’t. The good thing is that a compiled query plan in BaseX will already have dropped out those paths that can be statically detected as being useless.
But the task would be open-ended, perhaps even exceeding the complexity of an XQuery processor - the task of resolving XQuery expressions to a set of inferences, rather than to the expression value.
From an algorithmic point of view, you can do everything with XQuery
what you can do with Java. Creating the data paths with XQuery should even be more elegant, because as you can directly work down the XML query plan. But I agree it can be a challenge, because XQuery is probably not one of the easiest languages (however, you usually don’t regret the time you have spent to get to know it better ;).
Christian
Fine, Christian, I shall see what I can achieve. My interest is keen, as the validation of expressions embedded in configurations against XSDs has a potentially enormous value in the domain where I work. In fact, I had earlier begun to write a little XQuery parser myself for just this purpose (and you can imagine that I have not yet got very far on that way), but the alternative of using the BaseX query plans is a shining alternative - provided I can rely on the stability of the format (minor, documented changes from time to time are acceptable, too.) My first impressions of the query plan format is that it is indeed well-suited for such analysis - conveying the query structure in as concise and readable a way as one could wish for. When questions arise, I shall bug you offline, as such questions are probably too specific to be of general interest. And thank you for the explanations - yes, schema awareness is indeed something rather different, though overlapping. Finally: I think you are right in saying that the value of an idea is not related to the effort of its implementation. Perhaps there is something like a threshold: once it is passed, once the conceived value has slipped beyond that threshold, it acquires a peculiar independence. Many years ago, looking at distant church towers in Brugge, I had suddenly a sensation that there was something infinite about them: the effort of piling up and fitting together those stones was huge, but still finite - and now there they stand, and again and again and again human eyes look at them and drink the beauty which turns in our minds into joy. Inexhaustible.
Good night.Hans-Jürgen
Christian Grün christian.gruen@gmail.com schrieb am 14:39 Samstag, 20.Februar 2016:
Hi Hans-Jürgen,
I’ll start from the end of your mail:
I would be prepared to embark upon an XQuery implementation of a data path extractor, provided that you do not come to the conclusion that it would be of very little.
Awesome. The result will surely be interesting for others in the community as well.
The approach is not equivalent, but related to the concept of a "schema-aware XQuery processor", or am I wrong?
What a schema-aware processor mostly does is adding type information to the processed nodes, and this info needs to be handled at runtime. However, schema information can indeed be helpful at parse or compile time as well, at it allows for more optimizations. In BaseX, we use our database statistics and the name and path indexes for similar optimizations.
My feeling is that all XQuery implementors have turned away from that possibility due to a disproportion of effort and benefit.
I can’t speak for other implementations, but it would surely have cost us too much time to make BaseX schema-aware. Saxon does an excellent job at evaluating schema information. It might be worth checking out its query plans to get a feeling of what’s possible if schema info is available.
So - is the first idea, at second thought, worthless because leading towards sheerly unlimited amounts of effort?
Absolutely not ;) I would say that the value/merit of an idea has generally nothing to do with the effort related to making it happen.
let $a := /x return $a/y => root()/child::x/child::y
But the task would be open-ended, perhaps even exceeding the complexity of an XQuery processor - the task of resolving XQuery expressions to a set of inferences, rather than to the expression value.
Some optimizations like this are already taking place in BaseX. If you run the query above for a document that does not contain x or y elements, the resulting query plan will be an empty sequence. However, what we currently don’t do in BaseX is to pass on path information to variables. For example, look at the following input and queries:
* Input: <x><y/></x> * Query 1: xquery:parse('/x/x', map { 'compile': true(), 'plan': true() }) * Query 2: xquery:parse('let $x := /x return $x/x', map { 'compile': true(), 'plan': true() })
Query 1 will currently be rewritten to an empty sequence, but Query 2 won’t. The good thing is that a compiled query plan in BaseX will already have dropped out those paths that can be statically detected as being useless.
But the task would be open-ended, perhaps even exceeding the complexity of an XQuery processor - the task of resolving XQuery expressions to a set of inferences, rather than to the expression value.
From an algorithmic point of view, you can do everything with XQuery
what you can do with Java. Creating the data paths with XQuery should even be more elegant, because as you can directly work down the XML query plan. But I agree it can be a challenge, because XQuery is probably not one of the easiest languages (however, you usually don’t regret the time you have spent to get to know it better ;).
Christian
basex-talk@mailman.uni-konstanz.de