Thank you very much for your thoughts, Christian! First, yes: you are right, the second outcome referred to the first XPath expression: /descendant::OutboundFlight[Status = 'Booked'][1]/FlightDeparture/DepartureTime => //OutboundFlight/FlightDeparture/DepartureTime, //OutboundFlight/Status
But now to those conceptual problems. How about the following picture. (0) The goal: to map XQuery expressions to assertions about the implied structure (and possibly value constraints) of input data consumed by the expression; these implications might be "validated" against XML schemas describing the input data, enabling the detection of errors in the expression. (1) As a first iteration one might map the expression to all sequences of axis steps (considering all axes); the obtained path fragments might be absolute or relative paths, depending on the first axis step (absolute/relative). Example of an absolute path: /a//b[c]/ancestor::d =>root()/child::a/descendent-or-self::node/child::b/child::c,root()/child::a/descendent-or-self::node/child::b/ancestor::d Examples of a relative path: $doc/a//b[c]/ancestor::d => child::a/descendent-or-self::node/child::b/child::c, child::a/descendent-or-self::node/child::b/ancestor::d doc("dictionary.xml")/a//b[c]/ancestor::d => child::a/descendent-or-self::node/child::b/child::c, child::a/descendent-or-self::node/child::b/ancestor::d (2) Alas, the described navigation paths need not necessarily apply to a single document type. For example:/a/b[d = doc("dictionary.xml")/x/y => root()/child::a/child::b/child::d, child::x/child::y ~ ~ ~ In principle, one could advance further and further, adding iteratively more cabilities of inference. For example, at a second stage the "path extractor function" might perform some backtracking of variable references, enabling the following inference:let $a := /xreturn $a/y=>root()/child::x/child::y But the task would be open-ended, perhaps even exceeding the complexity of an XQuery processor - the task of resolving XQuery expressions to a set of inferences, rather than to the expression value. In fact, I think that such a "query => inferences" mapper would be of great usefulness - imagine how much quality control would be enabled! And any single error detected (including those detected by tools which do not claim to achieve any completeness of checking) means an instance of real *value*. But the amount and complex nature of the work required would probably scare off anybody, appearing unproportional.
So - is the first idea, at second thought, worthless because leading towards sheerly unlimited amounts of effort? Or does it make sense to define strict limits of the usefulness, to be attained with very limited effort. For example: (a) extracting all axis step sequences, as stated above(b) extracting basic information about data access operations (initial context item accessed y/n?; doc() calls with an argument resolvable to a literal URI; doc() calls with unresolved arguments). In rather typical cases of XPath and XQuery usage, only a single data source is accessed and already this modest set of information would suffice to assert that all axis step sequences must be valid against the document type of the (single) data source. In other words, in many scenarios, the information could be used to detect query errors.
What is attractive about this approach is the unlimited extensibility. When the ambition is given up to provide *all* possible inferences and when any inference is understood as a chance to catch errors, than the approach to start with something and add along the way - makes some sense. The approach is not equivalent, but related to the concept of a "schema-aware XQuery processor", or am I wrong? My feeling is that all XQuery implementors have turned away from that possibility due to a disproportion of effort and benefit. Nevertheless, it may be very useful to provide such a "data path extractor" (= minimal mapping of "query => inferences").
What do you think?
I would be prepared to embark upon an XQuery implementation of a data path extractor, provided that you do not come to the conclusion that it would be of very little. Cheers,Hans-Jürgen
Christian Grün christian.gruen@gmail.com schrieb am 8:52 Samstag, 20.Februar 2016:
Hi Hans-Jürgen,
One reason why I would favor to have an XQuery implementation first is because it would allow us to define the semantics behind this function. There are currently lots of questions that are unclear to me, so it’s mostly conceptual questions that would need to be solved before such a function could be realized.
I had a closer look at your examples:
let $query := "/descendant::OutboundFlight[Status = 'Booked'][1]/FlightDeparture/DepartureTime"
let $query := "/descendant::OutboundFlight[FlightDeparture/DepartureDate][1]/FlightDeparture/DepartureDate" return xquery:data-paths($query) => //OutboundFlight/FlightDeparture/DepartureTime, //OutboundFlight/Status
Probably the result of the second query refers to your first example, right?
To be strict, "/descendant::OutboundFlight" is not identical to "//OutboundFlight"; but it is equivalent. So I would have expected to get the following paths as output:
/descendant::OutboundFlight/child::FlightDeparture/child::DepartureTime, /descendant::OutboundFlight/child::Status
"//OF[1]" could lead to:
/descendant-or-self::node()/child::OF
Next, we would need to sort out if/how to handle axes other than descendant-or-self, descendant and child. Moreover, as XQuery allows us to have all kinds of expression nearly everywhere, sth. like "A[B = 'c']" or "A[B]/C" are rather special (albeit common) query patterns that could be written in many different ways. The most obvious patterns could probably nailed down and realized pretty quickly, but there would be a lot of potenzial for optimizing the query output and considering corner cases, and this is usually something that costs a lot of time.
As I assume that you have some specific patterns in mind that would be helpful for you, would you be interested in providing an initial solution in XQuery that we could adopt as Java function later on?
Christian