Hi Hans-Jürgen,
I’ll start from the end of your mail:
> I would be prepared to embark upon an XQuery implementation of a data path
> extractor, provided that you do not come to the conclusion that it would be
> of very little.
Awesome. The result will surely be interesting for others in the
community as well.
> The approach is not equivalent, but related to the concept of a
> "schema-aware XQuery processor", or am I wrong?
What a schema-aware processor mostly does is adding type information
to the processed nodes, and this info needs to be handled at runtime.
However, schema information can indeed be helpful at parse or compile
time as well, at it allows for more optimizations. In BaseX, we use
our database statistics and the name and path indexes for similar
optimizations.
> My feeling is that all
> XQuery implementors have turned away from that possibility due to a
> disproportion of effort and benefit.
I can’t speak for other implementations, but it would surely have cost
us too much time to make BaseX schema-aware. Saxon does an excellent
job at evaluating schema information. It might be worth checking out
its query plans to get a feeling of what’s possible if schema info is
available.
> So - is the first idea, at second thought, worthless because leading towards
> sheerly unlimited amounts of effort?
Absolutely not ;) I would say that the value/merit of an idea has
generally nothing to do with the effort related to making it happen.
> let $a := /x
> return $a/y
> =>
> root()/child::x/child::y
>
> But the task would be open-ended, perhaps even exceeding the complexity of
> an XQuery processor - the task of resolving XQuery expressions to a set of
> inferences, rather than to the expression value.
Some optimizations like this are already taking place in BaseX. If you
run the query above for a document that does not contain x or y
elements, the resulting query plan will be an empty sequence. However,
what we currently don’t do in BaseX is to pass on path information to
variables. For example, look at the following input and queries:
* Input: <x><y/></x>
* Query 1: xquery:parse('/x/x', map { 'compile': true(), 'plan': true() })
* Query 2: xquery:parse('let $x := /x return $x/x', map { 'compile':
true(), 'plan': true() })
Query 1 will currently be rewritten to an empty sequence, but Query 2
won’t. The good thing is that a compiled query plan in BaseX will
already have dropped out those paths that can be statically detected
as being useless.
> But the task would be open-ended, perhaps even exceeding the complexity of
> an XQuery processor - the task of resolving XQuery expressions to a set of
> inferences, rather than to the expression value.
From an algorithmic point of view, you can do everything with XQuery
what you can do with Java. Creating the data paths with XQuery should
even be more elegant, because as you can directly work down the XML
query plan. But I agree it can be a challenge, because XQuery is
probably not one of the easiest languages (however, you usually don’t
regret the time you have spent to get to know it better ;).
Christian