Re: [basex-talk] Joining a varying number of information sources (with XQuery)?

18 Apr 2022


      On Mon, Apr 18, 2022 at 10:04:42AM +0200, Markus Elfring scripsit:
...
...
...
See also:
https://www.w3.org/TR/xquery-31/#id-joins
You can do join _operations_,
I would appreciate further clarification for the distinction
which you present here.
"Join" functionally means "get information from multiple sources and
group it into a conceptually single result."
In an XML context, "source" can be "another part of the tree structure
of this document", "another document" (= something with a different
document uri property), or "something we got by an extension library"
such as using the functions file:read(), db:open(), and so on.
This makes "source of information" too complex to define a standard
operation you can call a join; having multiple sources of information is
a built-in assumption of the language because the get-me-that functions
are, like all the other functions, part of path expressions.  On the
unwelcome side, it's not a standard operation so you have to do your own
debugging; on the welcome side it's flexible and general.
(Path expressions are made up of one or more expressions, and those are
either functions or axis steps.)
...
...
but you aren't doing them on tables
(unless you did extra work to represent the tables hierarchically)
Some “tables” can be transformed into XQuery sequences, can't they?
Tables can be modeled as either lists of lists or as a grid.  If you
want to use lists of lists, you can make one row of the table a sequence
but you can't have a sequence of sequences -- it will turn into a single
sequence -- so you can't have a whole table that way.
It's doable using some combination of XML markup, maps, or arrays, but
any table needs some specific representation.
...
...
and there's absolutely no need for the keywords because the existing
more general mechanisms work fine.
I see further development challenges in this area for the safe and
convenient application of join conditions (or constraints).
XML has sort of three layers of reliability.
The inner most is XML structure; if it parses, you know it's made out of
nodes and the rules the nodes use.  ("Well-formed document" = this is
actually XML and can be parsed.)
Then you get "valid"; the well-formed document conforms to some schema.
(If there is a schema! there doesn't have to be and there might not be.)
This doesn't always help; it's quite possible that the schema allows
things that don't make sense (arbitrarily deep nesting of lists) that
you have to treat as an error condition for the processing.
Then you get "regular"; no amount of checking that the value is properly
an xs:date and that the date of birth is less than or equal the present
date and greater than some past date will save you from a clerical error
that put today's date in the date of birth field two weeks ago.  To be
sure the processing makes sense, you have to have regular inputs, and
that's hard.
Production code needs to look at "valid" and "regular" as distinct steps
before it does stuff like joins.  How depends on what you're doing.
...
I guess that you prefer to refer to them as “predicates within steps” so far
(according to path expressions).
https://www.w3.org/TR/xquery-31/#id-predicate
Predicates are part of path expressions, but still are not a join,
because predicates reduce the sequence produced by either the infix
expression (a function) or an axis step in this step of the path
expression.
Finding the thing you want to join and performing the join -- producing
a result that contains this thing and some other things -- are
conceptually distinct, or at least I think so.
//unusual_element ! concat(base-uri(.),path(.))
Is close to the minimum case of a join; we take an element and stick the
document uri of its containing document and the path to it in that
document together.  No predicates in the expression. It could be written
//unusual_element/concat(base-uri(.),path(.))
(because unusual_element is an element and thus a node and can go on the
left of the path operator / which only works on nodes)
This join could be made more specific by putting a predicate on the
first axis step in the XPath expression to test for associated id
attribute values made up of eleven or twelve digits:
//unusual_element[matches(@id,'^\p{Nd}{11,12}$')]/concat(base-uri(),path(.))
The predicate constrains the path expression.  It doesn't, strictly,
join; it's not putting anything with anything else in some new
arrangement or representation.
...
...
Use your functions to create maps where the keys come from that id
element's string value.
Customised data structures can be created together with XQuery maps
and arrays. But I find that a join operation would be needed before
based on available identification data.
I think this needs an example because I don't quite understand your
meaning here.
...
...
You could decide to skip the for clause and use
return $interesting_stuff1 !
my_fun:do_something(.,$interesting_stuff2(.),$interesting_stuff3(.))
instead.
How do you think about to work without an extra identification
sequence variable?
Formally, ! is the "simple mapping operator"; everything in the sequence
returned by the expression on the left becomes the context item in the
expression on the right.  (The context item is written as dot — . — by
ancient XPath tradition.)
Less formally, "do this thing on the right to everything on
the left"; it works the same way the minimal case of a for clause does,
but it's XPath, not XQuery, so you can do it in an XPath expression.
And it's less typing than the XPath for.
...
...
...
Will any further comparisons evolve for the provided functionality?
Don't think so.  I find the trick with XQuery is to not fight with
it about being some other language.
Internalizing the sequence concept takes work; …
Would you like to extend programming interfaces for the management of
relationships with various entities?
I need an example here, because I'm not following what you mean.
-- 
Graydon Saunders  | graydonish@gmail.com
Þæs oferéode, ðisses swá mæg.
-- Deor  ("That passed, so may this.")

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Joining a varying number of information sources (with XQuery)?