Joining a varying number of information sources (with XQuery)?

List overview All Threads
Download

newer

older

Performing extra data conversion...

Improving the understanding for...

Markus Elfring

16 Apr 2022 16 Apr '22

8:16 a.m.

Hello,

It is supported to specify a fixed number of information sources for a for clause. https://www.w3.org/TR/xquery-31/#id-xquery-for-clause

I imagine that the selection of for bindings can occasionally vary so much and might become so big that it would be needed to construct FLWOR expressions by more sophisticated data structures. I guess that the specification of consistent join conditions (and related constraints) would become more challenging. Can any programming interfaces help further to work with such use cases? https://docs.basex.org/wiki/XQuery_Module

Regards, Markus

Show replies by date

Graydon

16 Apr 16 Apr

10:42 a.m.

On Sat, Apr 16, 2022 at 02:16:23PM +0200, Markus Elfring scripsit:

...

It is supported to specify a fixed number of information sources for a for clause. https://www.w3.org/TR/xquery-31/#id-xquery-for-clause

I imagine that the selection of for bindings can occasionally vary so much and might become so big that it would be needed to construct FLWOR expressions by more sophisticated data structures.

A for clause will take any sequence as the binding sequence, and the order of clauses in a FLOWR expression comes down to "ends with a return clause" for most practical purposes.

let $interesting as item()* := some expression for $this in $interesting ....

Works as a pattern.

So does

let $interesting.... where some test for $this in interesting ....

Generally speaking, the way to cope with size is by using references. That's either indexes or by writing queries that find the interesting node once and thereafter use references. It's not immediately obvious that this is what maps are for, but if you construct a map and the value is a DB node, the map key for that value functions as a pointer to that value. You can then iterate on the map keys.

In BaseX, the performance benefit from doing this can be a couple-three orders of magnitude.

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Markus Elfring

11:27 a.m.

...

...
I imagine that the selection of for bindings can occasionally vary so much and might become so big that it would be needed to construct FLWOR expressions by more sophisticated data structures.

A for clause will take any sequence as the binding sequence,

How do you think about to take another look at the development challenges according to growing numbers of binding sequences and corresponding join conditions?

...

and the order of clauses in a FLOWR expression comes down to "ends with a return clause" for most practical purposes.

Thanks for your feedback.

Would you like to share any ideas for further extensions of the involved programming interfaces?

Regards, Markus

Graydon

12:13 p.m.

On Sat, Apr 16, 2022 at 05:27:02PM +0200, Markus Elfring scripsit:

...

...
...
I imagine that the selection of for bindings can occasionally vary so much and might become so big that it would be needed to construct FLWOR expressions by more sophisticated data structures.

A for clause will take any sequence as the binding sequence,

How do you think about to take another look at the development challenges according to growing numbers of binding sequences and corresponding join conditions?

I think there's usually four steps:

1. have I got the right data? (meaning, "can this data tell me what I want to know?")

2. can I write an expression that I belive will find all that data?

This is where the questions of joining comes in, and the answer is generally "don't write complex XPath". The optimizer does a LOT better with several simple expressions where you take the union of the results later. Just like you would generally want to use distinct-values() before sort(), you generally want to do all your abstracting before you do any joining.

"Is there a simpler expression?" is a useful habitual question. (NOT "more elegant"!)

3. write a rough draft as a sequence of let clauses binding variables to FLOWR statements. One variable, one item of information. (that is, thing that causes change in this context.)

4. when 3 works, rewrite it as proper functions in a module. (which might mean deciding 3 did the abstraction is a sub-optimal way. This process iterates.)

Make the functions strict; type everything as narrowly as possible, figure out where the distrust of inputs goes and makes sure it's uniform, use try-catch, etc. to return errors as specifically as possible.

...

Would you like to share any ideas for further extensions of the involved programming interfaces?

Have you got a specific problem? Without that, it's difficult to say anything useful.

Declarative languages don't do interfaces in the sense I think you might mean; it's all functions, all the way down.

My general design take is that information causes change, and you're trying to identify what would be information if you had it, and in the code identify whether or not you do have it this time. Everything else is picking how and in what order to do any abstraction.

XQuery isn't hard but it is different. The abstract ideas of a lot of imperative best practices carry over ("test!"), but the "enough effort can write FORTRAN in any language" truism applies. The difference in utility and effort between fluent XQuery and the FORTRANized version is large, and the effort of getting fluent is well repaid.

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Markus Elfring

3:10 p.m.

...

...
How do you think about to take another look at the development challenges according to growing numbers of binding sequences and corresponding join conditions?

I think there's usually four steps:

Did you stumble on bigger numbers of binding sequences for which you would like to join some information?

...

...
Would you like to share any ideas for further extensions of the involved programming interfaces?

Have you got a specific problem?

I am looking for another bit of clarification according to a general data processing task like joining information from several sources (when their size and number would become remarkable).

...

Declarative languages don't do interfaces in the sense I think you might mean; it's all functions, all the way down.

Software libraries (or modules) are provided accordingly. I imagine that corresponding adjustments will become more helpful.

Regards, Markus

Graydon

8:54 p.m.

On Sat, Apr 16, 2022 at 09:10:16PM +0200, Markus Elfring scripsit:

...

...
...
How do you think about to take another look at the development challenges according to growing numbers of binding sequences and corresponding join conditions?

I think there's usually four steps:

Did you stumble on bigger numbers of binding sequences for which you would like to join some information?

Generally the pattern for that is:

1 process each XPath expression into a sequence of maps; ideally there's a common function you pass a sequence of nodes, but

local:mapify($found as node()*) as map(*)* { ... stuff happens.... };

can have per-expression variants if necessary.

2 now you've got multiple sequences of maps; make it one sequence using the comma operator or by processing all your XPath expressions as inputs to something that gets the node sequence from the various XPath for you.

For purposes of this example, call that combined sequence of all the sequences of maps $everything which has type map(*)*.

3 Abstract:

let $together as map(*) := map:merge( for $key in ($everything ! map:keys(.)) => distinct-values() return map:entry($key,$everything[map:keys(.) eq $key] ! .($key)) )

(syntax warning; I typed that, I didn't run it.)

So you wind up with one map referencing everything your XPath expressions found.

This won't inherently keep information like where the node came from in the map, but it's still the original node by reference; something like base-uri() can tell you where the node originates if you need to know that.

If all you want is the result from the mapify function, you've got all of them and can do any subsequent processing that's appropriate.

...

...
...
Would you like to share any ideas for further extensions of the involved programming interfaces?

Have you got a specific problem?

I am looking for another bit of clarification according to a general data processing task like joining information from several sources (when their size and number would become remarkable).

First question with BaseX is "do I already have the database, or am I creating it?"

It's hard to have so much data you need to get clever. Generally, creating multiple databases will solve most scale problems.

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Markus Elfring

17 Apr 17 Apr

2:05 a.m.

...

...
Did you stumble on bigger numbers of binding sequences for which you would like to join some information?

Generally the pattern for that is:

1 process each XPath expression into a sequence of maps; ideally there's a common function you pass a sequence of nodes, but …

Where did you mention join conditions for selected record sets in your approach for an algorithm description?

Regards, Markus

Graydon

11:22 a.m.

On Sun, Apr 17, 2022 at 08:05:51AM +0200, Markus Elfring scripsit:

...

...
...
Did you stumble on bigger numbers of binding sequences for which you would like to join some information?

Generally the pattern for that is:

1 process each XPath expression into a sequence of maps; ideally there's a common function you pass a sequence of nodes, but …

Where did you mention join conditions for selected record sets in your approach for an algorithm description?

There are neither join conditions nor record sets in XQuery. I've been supposing you're interested in how you do something conceptually similar.

Predicates filter; you can't add anything to a sequence with a predicate, you can only remove. ("these things, except if the prediate is true")

XQuery and SQL are not similar languages; they're both query languages, but SQL is built on set theory while XQuery is built on graph theory (XPath) and the idea of a tuple stream processor (FLOWR expressions). The underlying math for XQuery is younger. For example, the data structure under maps (finger trees) was first published _as math_ in 2006. You can't use either to understand the other one.

What you might be looking for is the union operator, which can you can write as "union" or (customarily and more frequently) as | which people often expect to mean OR and doesn't.

You can union arbitarily many XPath expressions together. What I described was an approach to not do that; process the results of each of the expressions, and then process each expression's results together by reference. You were concerned about large volumes of data and generally speaking "maps as soon as possible" is a good general rule for handling lots of input efficiently. (Maps are also a good way to abstract data from multiple database instances together with a uniform set of references.)

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Markus Elfring

3 p.m.

...

There are neither join conditions nor record sets in XQuery.

I suggest to compare this view to the situation before the key word “JOIN” was added to the SQL standard. https://en.wikipedia.org/wiki/Join_(SQL)

...

I've beens upposing you're interested in how you do something conceptually similar.

Predicates filter; you can't add anything to a sequence with a predicate, you can only remove. ("these things, except if the prediate is true")

How do you think about the following XQuery script sketch?

let $interesting_stuff1 as item()* := my_fn:get_data("some expression"), $interesting_stuff2 as item()* := my_fn:determine_further_data(), $interesting_stuff3 as item()* := my_fn:evaluate_another_expression() for $this1 in $interesting_stuff1, $this2 in $interesting_stuff2, $this3 in $interesting_stuff3 where $this1/id = $this2/id and $this2/id = $this3/id return do_something($this1/id, $this2/description, $this3/comment)

...

XQuery and SQL are not similar languages; they're both query languages, but SQL is built on set theory while XQuery is built on graph theory (XPath) and the idea of a tuple stream processor (FLOWR expressions). The underlying math for XQuery is younger. For example, the data structure under maps (finger trees) was first published _as math_ in 2006. You can't use either to understand the other one.

Will any further comparisons evolve for the provided functionality?

Regards, Markus

Graydon

5:08 p.m.

On Sun, Apr 17, 2022 at 09:00:10PM +0200, Markus Elfring scripsit:

...

...
There are neither join conditions nor record sets in XQuery.

I suggest to compare this view to the situation before the key word “JOIN” was added to the SQL standard. https://en.wikipedia.org/wiki/Join_(SQL)

See also: https://www.w3.org/TR/xquery-31/#id-joins

You can do join _operations_, but you aren't doing them on tables (unless you did extra work to represent the tables hierarchically) and there's absolutely no need for the keywords because the existing more general mechanisms work fine.

...

How do you think about the following XQuery script sketch?

let $interesting_stuff1 as item()* := my_fn:get_data("some expression""),), $interesting_stuff2 as item()* := my_fn:determine_further_data(), $interesting_stuff3 as item()* := my_fn:evaluate_another_expression() for $this1 in $interesting_stuff1, $this2 in $interesting_stuff2, $this3 in $interesting_stuff3 where $this1/id = $this2/id and $this2/id = $this3/id return do_something($this1/id, $this2/description, $this3/comment)

You're asking the optimizer to turn something O(N^2) into something efficient and you don't have to. All of this keys on an id element's string value and you already know that.

Use your functions to create maps where the keys come from that id element's string value.

(: usual caveat; this is typing, it hasn't been run :)

(: this is a sequence of unique id values :) let $intresting_stuff1 as xs:string+ := my_fun:get_data("some expression") (: this maps id values to description elements :) let $intresting_stuff2 as map(xs:string,element(description)) := my_fun:determine_further_data() (: maps id values to comment elements :) let $intresting_stuff3 as map(xs:string,element(comment)) := my_fun:evaluage_another_expression()

(: bind to the sequence of id values :) for $id in $interesting_stuff1 return (: run the function per-id :) my_fun:do_something($id,$interesting_stuff2($id),$interesting_stuff3($id))

You could decide to skip the for clause and use

return $interesting_stuff1 ! my_fun:do_something(.,$interesting_stuff2(.),$interesting_stuff3(.))

instead.

You could do something similar (and conceptually simpler, but maybe not better in practical terms) as:

(: all of the expressions need to return elements with id element descendants where the element has a meaningful string value :) for $data in (db:open('one')/expression_one,db:open('two')/expression_two,db:open('three')/expression_three) (: get the first descendant id element and take its string value :) let $id as xs:string? := $data/descendant::id[1]/string() (: if there's no $id value, stop processing this member of the binding sequence :) where $id (: re-order the entire tuple stream into one tuple per distinct $id value, with however many $data variables have that id value associated with it :) group by $id (: after the group by clause, a reference to $data is a reference to a sequence of $data bindings, everything where this specific id value was found as the first descendent id element's string property of the elements in the binding sequence of the for clause :) return my_fun:do_something($data)

This approach assumes my_fun:do_something() knows what it's looking for, how to filter that out of the elements it gets passed, and how to order it in what it returns. Because it has the actual nodes, it can tell where they came from if it needs to. This approach can work better with messy data where 'two' might have comments and 'three' might have descriptions and you'll need to either use both or add logic about which one is preferred. (Or you functions to create the maps can be a bit smarter. Depends on the data.)

...

...
XQuery and SQL are not similar languages; they're both query languages, but SQL is built on set theory while XQuery is built on graph theory (XPath) and the idea of a tuple stream processor (FLOWR expressions). The underlying math for XQuery is younger. For example, the data structure under maps (finger trees) was first published _as math_ in 2006. You can't use either to understand the other one.

Will any further comparisons evolve for the provided functionality?

Don't think so. I find the trick with XQuery is to not fight with it about being some other language.

Internalizing the sequence concept takes work; internalizing the "this one, and all of them, at the same time" tuple stream processor concept takes more work. Once you've done that, you've got an extremely powerful and general tool.

(Rather like using git, I might put this as not trying to outsmart the capable people who designed XQuery. I'm going to lose if I do that.)

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Markus Elfring

18 Apr 18 Apr

4:04 a.m.

...

...
See also: https://www.w3.org/TR/xquery-31/#id-joins

You can do join _operations_,

I would appreciate further clarification for the distinction which you present here.

...

                          but you aren't doing them on tables
(unless you did extra work to represent the tables hierarchically)

Some “tables” can be transformed into XQuery sequences, can't they?

...

and there's absolutely no need for the keywords because the existing more general mechanisms work fine.

I see further development challenges in this area for the safe and convenient application of join conditions (or constraints).

I guess that you prefer to refer to them as “predicates within steps” so far (according to path expressions). https://www.w3.org/TR/xquery-31/#id-predicate

...

Use your functions to create maps where the keys come from that id element's string value.

Customised data structures can be created together with XQuery maps and arrays. But I find that a join operation would be needed before based on available identification data.

...

(: bind to the sequence of id values :) for $id in $interesting_stuff1 return (: run the function per-id :) my_fun:do_something($id,$interesting_stuff2($id),$interesting_stuff3($id))

You could decide to skip the for clause and use

return $interesting_stuff1 ! my_fun:do_something(.,$interesting_stuff2(.),$interesting_stuff3(.))

instead.

How do you think about to work without an extra identification sequence variable?

...

...
Will any further comparisons evolve for the provided functionality?

Don't think so. I find the trick with XQuery is to not fight with it about being some other language.

Internalizing the sequence concept takes work; …

Would you like to extend programming interfaces for the management of relationships with various entities?

Regards, Markus

Jonathan Robie

2:01 p.m.

...

Customised data structures can be created together with XQuery maps and arrays. But I find that a join operation would be needed before based on available identification data.

Could you please write up a few use cases with data to show where you are running into difficulty?

I am working with fairly complex linguistic datasets and joining them every which way with XQuery. The designers of XQuery included Don Chamberlin. one of the inventors of SQL, and Jim Melton, the chair of the SQL committee. We were looking at joining and combining diverse datasets as a key use case for XQuery. It's possible that we missed something, but it would be extremely helpful to have a concrete use case to consider, with real data.

Jonathan

On Mon, Apr 18, 2022 at 4:04 AM Markus Elfring Markus.Elfring@web.de wrote:

...

...
...
See also: https://www.w3.org/TR/xquery-31/#id-joins

You can do join _operations_,

I would appreciate further clarification for the distinction which you present here.

...
                          but you aren't doing them on tables
(unless you did extra work to represent the tables hierarchically)
Some “tables” can be transformed into XQuery sequences, can't they?

...
and there's absolutely no need for the keywords because the existing more general mechanisms work fine.

I see further development challenges in this area for the safe and convenient application of join conditions (or constraints).

I guess that you prefer to refer to them as “predicates within steps” so far (according to path expressions). https://www.w3.org/TR/xquery-31/#id-predicate

...
Use your functions to create maps where the keys come from that id element's string value.

Customised data structures can be created together with XQuery maps and arrays. But I find that a join operation would be needed before based on available identification data.

...
(: bind to the sequence of id values :) for $id in $interesting_stuff1 return (: run the function per-id :)

my_fun:do_something($id,$interesting_stuff2($id),$interesting_stuff3($id))

...
You could decide to skip the for clause and use

return $interesting_stuff1 ! my_fun:do_something(.,$interesting_stuff2(.),$interesting_stuff3(.))

instead.

How do you think about to work without an extra identification sequence variable?

...
...
Will any further comparisons evolve for the provided functionality?

Don't think so. I find the trick with XQuery is to not fight with it about being some other language.

Internalizing the sequence concept takes work; …

Would you like to extend programming interfaces for the management of relationships with various entities?

Regards, Markus

Markus Elfring

3:54 p.m.

...

We were looking at joining and combining diverse datasets as a key use case for XQuery.

This technical requirement is generally fine.

...

It's possible that we missed something,

Do preferences matter if combinations of information sources are performed with a key word like “JOIN” (or not)?

...

but it would be extremely helpful to have a concrete use case to consider, with real data.

Would you like to check the influence of numbers according to joinable items once more on data processing efforts?

Regards, Markus

Jonathan Robie

4:01 p.m.

Responses inline ...

On Mon, Apr 18, 2022 at 3:54 PM Markus Elfring Markus.Elfring@web.de wrote:

...

...
We were looking at joining and combining diverse datasets as a key use

case for XQuery.

This technical requirement is generally fine.

...
It's possible that we missed something,

Do preferences matter if combinations of information sources are performed with a key word like “JOIN” (or not)?

That's just syntax. And people clearly have different preferences for syntax. It's easy for that to turn into bike-shedding ( https://en.wikipedia.org/wiki/Law_of_triviality).

...

but it would be extremely helpful to have a concrete use case to consider, with real data.

Would you like to check the influence of numbers according to joinable items once more on data processing efforts?

Are you asking about how well XQuery implementations perform compared to SQL databases? For a particular kind of query? If not, what are you asking?

Jonathan

Markus Elfring

4:42 p.m.

...

Would you like to check the influence of numbers according to joinable items
once more on data processing efforts?
Are you asking about how well XQuery implementations perform compared to SQL databases?

Not directly.

...

For a particular kind of query?

Yes.

I propose once more to take another look at specification efforts for join conditions (or constraints).

Regards, Markus

Graydon

5:35 p.m.

On Mon, Apr 18, 2022 at 10:42:28PM +0200, Markus Elfring scripsit:

...

I propose once more to take another look at specification efforts for join conditions (or constraints).

What specific problem is it that you're trying to solve? What thing do you want to do that you do not believe you can do in XQuery?

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Markus Elfring

19 Apr 19 Apr

4:46 a.m.

...

...
I propose once more to take another look at specification efforts for join conditions (or constraints).

What specific problem is it that you're trying to solve? What thing do you want to do that you do not believe you can do in XQuery?

I imagine that join parameters can be passed to customised functions so that for clauses would dynamically be constructed for further data processing. Will a function like “xquery:eval” be called finally?

Regards, Markkus

Christian Grün

5:01 a.m.

Hi Markus,

Could you please get specific and provide real use cases and examples?

Thanks, Christian

On Tue, Apr 19, 2022 at 10:46 AM Markus Elfring Markus.Elfring@web.de wrote:

...

...
...
I propose once more to take another look at specification efforts for join conditions (or constraints).

What specific problem is it that you're trying to solve? What thing do you want to do that you do not believe you can do in XQuery?

I imagine that join parameters can be passed to customised functions so that for clauses would dynamically be constructed for further data processing. Will a function like “xquery:eval” be called finally?

Regards, Markkus

Markus Elfring

5:23 a.m.

...

Could you please get specific and provide real use cases and examples?

An example application was published already with the XQuery 3.1 specification for the combination of information from three documents. https://www.w3.org/TR/2017/REC-xquery-31-20170321/#id-joins

Would you occasionally like to join significantly more entities (with for clauses according to known relationships)?

Regards, Markus

Christian Grün

5:29 a.m.

...

An example application was published already with the XQuery 3.1 specification for the combination of information from three documents. https://www.w3.org/TR/2017/REC-xquery-31-20170321/#id-joins

I have seen this link. I was asking for your personal uses case or examples, and I think this is what others have been asking you for as well.

Markus Elfring

5:40 a.m.

...

I was asking for your personal uses case or examples,

My use cases can be similar to the example application from the XQuery specification.

...

and I think this is what others have been asking you for as well.

I am trying to point further development possibilities out according to varying and growing numbers of entities which would be referenced in for clauses.

Regards, Markus

Graydon

18 Apr 18 Apr

2:09 p.m.

On Mon, Apr 18, 2022 at 10:04:42AM +0200, Markus Elfring scripsit:

...

...
...
See also: https://www.w3.org/TR/xquery-31/#id-joins

You can do join _operations_,

I would appreciate further clarification for the distinction which you present here.

"Join" functionally means "get information from multiple sources and group it into a conceptually single result."

In an XML context, "source" can be "another part of the tree structure of this document", "another document" (= something with a different document uri property), or "something we got by an extension library" such as using the functions file:read(), db:open(), and so on.

This makes "source of information" too complex to define a standard operation you can call a join; having multiple sources of information is a built-in assumption of the language because the get-me-that functions are, like all the other functions, part of path expressions. On the unwelcome side, it's not a standard operation so you have to do your own debugging; on the welcome side it's flexible and general.

(Path expressions are made up of one or more expressions, and those are either functions or axis steps.)

...

...
but you aren't doing them on tables (unless you did extra work to represent the tables hierarchically)

Some “tables” can be transformed into XQuery sequences, can't they?

Tables can be modeled as either lists of lists or as a grid. If you want to use lists of lists, you can make one row of the table a sequence but you can't have a sequence of sequences -- it will turn into a single sequence -- so you can't have a whole table that way.

It's doable using some combination of XML markup, maps, or arrays, but any table needs some specific representation.

...

...
and there's absolutely no need for the keywords because the existing more general mechanisms work fine.

I see further development challenges in this area for the safe and convenient application of join conditions (or constraints).

XML has sort of three layers of reliability.

The inner most is XML structure; if it parses, you know it's made out of nodes and the rules the nodes use. ("Well-formed document" = this is actually XML and can be parsed.)

Then you get "valid"; the well-formed document conforms to some schema. (If there is a schema! there doesn't have to be and there might not be.) This doesn't always help; it's quite possible that the schema allows things that don't make sense (arbitrarily deep nesting of lists) that you have to treat as an error condition for the processing.

Then you get "regular"; no amount of checking that the value is properly an xs:date and that the date of birth is less than or equal the present date and greater than some past date will save you from a clerical error that put today's date in the date of birth field two weeks ago. To be sure the processing makes sense, you have to have regular inputs, and that's hard.

Production code needs to look at "valid" and "regular" as distinct steps before it does stuff like joins. How depends on what you're doing.

...

I guess that you prefer to refer to them as “predicates within steps” so far (according to path expressions). https://www.w3.org/TR/xquery-31/#id-predicate

Predicates are part of path expressions, but still are not a join, because predicates reduce the sequence produced by either the infix expression (a function) or an axis step in this step of the path expression.

Finding the thing you want to join and performing the join -- producing a result that contains this thing and some other things -- are conceptually distinct, or at least I think so.

//unusual_element ! concat(base-uri(.),path(.))

Is close to the minimum case of a join; we take an element and stick the document uri of its containing document and the path to it in that document together. No predicates in the expression. It could be written

//unusual_element/concat(base-uri(.),path(.))

(because unusual_element is an element and thus a node and can go on the left of the path operator / which only works on nodes)

This join could be made more specific by putting a predicate on the first axis step in the XPath expression to test for associated id attribute values made up of eleven or twelve digits:

//unusual_element[matches(@id,'^\p{Nd}{11,12}$')]/concat(base-uri(),path(.))

The predicate constrains the path expression. It doesn't, strictly, join; it's not putting anything with anything else in some new arrangement or representation.

...

...
Use your functions to create maps where the keys come from that id element's string value.

Customised data structures can be created together with XQuery maps and arrays. But I find that a join operation would be needed before based on available identification data.

I think this needs an example because I don't quite understand your meaning here.

...

...
You could decide to skip the for clause and use

return $interesting_stuff1 ! my_fun:do_something(.,$interesting_stuff2(.),$interesting_stuff3(.))

instead.

How do you think about to work without an extra identification sequence variable?

Formally, ! is the "simple mapping operator"; everything in the sequence returned by the expression on the left becomes the context item in the expression on the right. (The context item is written as dot — . — by ancient XPath tradition.)

Less formally, "do this thing on the right to everything on the left"; it works the same way the minimal case of a for clause does, but it's XPath, not XQuery, so you can do it in an XPath expression. And it's less typing than the XPath for.

...

...
...
Will any further comparisons evolve for the provided functionality?

Don't think so. I find the trick with XQuery is to not fight with it about being some other language.

Internalizing the sequence concept takes work; …

Would you like to extend programming interfaces for the management of relationships with various entities?

I need an example here, because I'm not following what you mean.

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Markus Elfring

4:33 p.m.

...

This makes "source of information" too complex to define a standard operation you can call a join; …

I find that this view will need further clarification.

...

...
I guess that you prefer to refer to them as “predicates within steps” so far (according to path expressions). https://www.w3.org/TR/xquery-31/#id-predicate

Predicates are part of path expressions,

Such functionality can be helpful.

...

but still are not a join,

Under which circumstances would you interpret the specification of special conditions (or constraints) as a join operation?

...

because predicates reduce the sequence produced by either the infix expression (a function) or an axis step in this step of the path expression.

Finding the thing you want to join and performing the join -- producing a result that contains this thing and some other things -- are conceptually distinct, or at least I think so.

Will this view trigger further considerations?

...

...
Would you like to extend programming interfaces for the management of relationships with various entities?

I need an example here, because I'm not following what you mean.

I guess that you are used to some approaches for the handling of joinable items. Do you care for the number of involved items here?

Regards, Markus

Graydon

5:29 p.m.

On Mon, Apr 18, 2022 at 10:33:19PM +0200, Markus Elfring scripsit:

...

...
This makes "source of information" too complex to define a standard operation you can call a join; …

I find that this view will need further clarification.

To join something, you minimally need at least two expressions which find the things to be joined, some kind of rule for how to perform the join, and a destination for the result of the join.

If you pass in expressions, you've replicated the language after removing an opportunity to perform static analysis. This won't work better.

If you say, no, no, pass in the _results_ of the expressions, you have to handle (at least) arbitrary sequences, and sequences are not required to have all members the same type -- a sequence can properly contain document nodes, functions, and xs:dateTimeDuration members in some arbitrary order and amount -- you still wind up replicating much of the language in the rule for performing the join.

If you're dealing with XML you have to deal with all of XDM all the time. (Or worse! XDM is the best or at least the most successful abstraction of XML.)

XDM = XPath and XQuery Data Model, https://www.w3.org/TR/xpath-datamodel-31/

[predicates aren't joins]

...

...
but still are not a join,

Under which circumstances would you interpret the specification of special conditions (or constraints) as a join operation?

A join requires the results of two expressions to be combined.

(db:open('thing1'),db:open('thing2'))/descendant::patienti-identifier[local:check-range($interesting,.)]

can be interpreted as a performing a join, you can re-write it as

(db:open('thing1')/descendant::patient-identifier[local:check-range($interesting,.)],db:open('thing2')/descendant::patient-identifier[local:check-range($interesting,.)])

but the predicate isn't doing the joining, the predicate is narrowing the selection of the XPath expressions. In this case, the comma operator is doing the joining.

...

...
Finding the thing you want to join and performing the join -- producing a result that contains this thing and some other things -- are conceptually distinct, or at least I think so.

Will this view trigger further considerations?

Don't see why.

You haven't provided a use case, on the one hand, and I've done this a lot, on the other; the view arises from experience.

...

...
...
Would you like to extend programming interfaces for the management of relationships with various entities?

I need an example here, because I'm not following what you mean.

I guess that you are used to some approaches for the handling of joinable items. Do you care for the number of involved items here?

No.

Any expression returns a sequence; any sequence can have arbitrarily many members. Writing XQuery while remembering that this is true is generally easier than writing XQuery that is trying to be specific about how many of something will be returned by the expression. It's completely possible and sometimes necessary to count things; more than one GUID attached to the same patient is something to check for, for example. But the general case is easier when you presume you're handling a sequence with an arbitrary number of members.

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Markus Elfring

19 Apr 19 Apr

4:32 a.m.

...

To join something, you minimally need at least two expressions which find the things to be joined,

I am still trying to clarify corresponding development possibilities according to bigger numbers of entities together with for clauses. (I hope that other meanings can be better distinguished from related applications of a function like “string-join”.)

...

some kind of rule for how to perform the join, and a destination for the result of the join.

This is usual.

...

[predicates aren't joins]

How does this feedback fit to other documentations which describe the role of predicates for join operations?

...

A join requires the results of two expressions to be combined.

(db:open('thing1'),db:open('thing2'))/descendant::patienti-identifier[local:check-range($interesting,.)]

can be interpreted as a performing a join, you can re-write it as

(db:open('thing1')/descendant::patient-identifier[local:check-range($interesting,.)],db:open('thing2')/descendant::patient-identifier[local:check-range($interesting,.)])

but the predicate isn't doing the joining, the predicate is narrowing the selection of the XPath expressions.

This can be a desirable effect.

...

In this case, the comma operator is doing the joining.

This example refers to another variant of a join operation for the construction of a sequence. (Such a XQuery code fragment does not use for clauses.)

...

...
Do you care for the number of involved items here?

No.

Would you like to adjust this view according to the usage of for clauses? https://www.w3.org/TR/2017/REC-xquery-31-20170321/#id-joins

Regards, Markus

Jonathan Robie

5:45 a.m.

Hi Markus,

Have you read the XQuery specifications? This section on joins, written by one of the inventors of SQL, may be a helpful starting point:

https://www.w3.org/TR/xquery-31/#id-joins

The use case documents may also be helpful:

https://www.w3.org/TR/xquery-use-cases/

https://www.w3.org/TR/xquery-30-use-cases/

https://www.w3.org/TR/xquery-31-requirements/

Please read up on joins in these places, I think it will help establish common ground for this conversation.

Jonathan

On Tue, Apr 19, 2022, 04:32 Markus Elfring Markus.Elfring@web.de wrote:

...

...
To join something, you minimally need at least two expressions which find the things to be joined,

I am still trying to clarify corresponding development possibilities according to bigger numbers of entities together with for clauses. (I hope that other meanings can be better distinguished from related applications of a function like “string-join”.)

...
some kind of rule for how to perform the join, and a destination for the result of the join.

This is usual.

...
[predicates aren't joins]

How does this feedback fit to other documentations which describe the role of predicates for join operations?

...
A join requires the results of two expressions to be combined.

(db:open('thing1'),db:open('thing2'))/descendant::patienti-identifier[local:check-range($interesting,.)]

...
can be interpreted as a performing a join, you can re-write it as

(db:open('thing1')/descendant::patient-identifier[local:check-range($interesting,.)],db:open('thing2')/descendant::patient-identifier[local:check-range($interesting,.)])

...
but the predicate isn't doing the joining, the predicate is narrowing the selection of the XPath expressions.

This can be a desirable effect.

...
In this case, the comma operator is doing the joining.

This example refers to another variant of a join operation for the construction of a sequence. (Such a XQuery code fragment does not use for clauses.)

...
...
Do you care for the number of involved items here?

No.

Would you like to adjust this view according to the usage of for clauses? https://www.w3.org/TR/2017/REC-xquery-31-20170321/#id-joins

Regards, Markus

Markus Elfring

6:04 a.m.

...

Have you read the XQuery specifications?

Yes.

...

This section on joins, written by one of the inventors of SQL, may be a helpful starting point:

https://www.w3.org/TR/xquery-31/#id-joins

It seems that I stumble on communication difficulties for this application area.

Regards, Markus

Jonathan Robie

12:42 p.m.

Here's one thing you may be asking - do you want to know how to specify a join for n sources? If so, maybe this example will help:

declare variable $eng := (<d n="1">one</d>, <d n="2">two</d>, <d n="3">three</d>, <d n="4">four</d>); declare variable $deu := (<d n="1">eins</d>, <d n="2">zwei</d>, <d n="3">drei</d>, <d n="4">vier</d>); declare variable $ukr := (<d n="1">один</d>, <d n="2">два</d>, <d n="3">три</d>, <d n="4">чотири</d>); declare variable $heb := (<d n="1">אחד</d>, <d n="2">שתיים</d>, <d n="3">שלוש</d>, <d n="4">ארבע</d>);

for $e in $eng for $d in $deu for $u in $ukr for $h in $heb where $e/@n = $d/@n and $e/@n = $u/@n and $e/@n = $h/@n return <wf>{ $e, $d, $u, $h }</wf>

This returns:

<wf> <d n="1">one</d> <d n="1">eins</d> <d n="1">один</d> <d n="1">אחד</d> </wf> <wf> <d n="2">two</d> <d n="2">zwei</d> <d n="2">два</d> <d n="2">שתיים</d> </wf> <wf> <d n="3">three</d> <d n="3">drei</d> <d n="3">три</d> <d n="3">שלוש</d> </wf> <wf> <d n="4">four</d> <d n="4">vier</d> <d n="4">чотири</d> <d n="4">ארבע</d> </wf>

Does that answer your question? I routinely do queries that join multiple sources.

Jonathan On Tue, Apr 19, 2022 at 6:04 AM Markus Elfring Markus.Elfring@web.de wrote:

...

...
Have you read the XQuery specifications?

Yes.

...
This section on joins, written by one of the inventors of SQL, may be a

helpful starting point:

...
https://www.w3.org/TR/xquery-31/#id-joins

It seems that I stumble on communication difficulties for this application area.

Regards, Markus

Liam R. E. Quin

1:21 p.m.

On Tue, 2022-04-19 at 12:42 -0400, Jonathan Robie wrote:

...

Here's one thing you may be asking - do you want to know how to specify a join for n sources?

I think the question was, what if n is large or dynamic.

But then we fall back to needing a use case because the best strategy depends on circumstance and people.

XQuery and XSLT are much more dynamic language than SQL, in the way they feel. You're not really doing a SQL-style join so much as programming with relationships, which is more like working with an ER diagram i suppose, but at a micro level. Yes, the underlying XQuery engine is doing things like joins and making tuple-streams, and that's pretty fundamental, but it's implicit and pervasive.

let $me := $here/@id return /doc/people/person[@postcode = $id]

might be thought of as doing a join on the id attribute of person elements even though it's not in a FLWOR expression.

So part of moving to XQuery (as with XSLT) is a change in the way we think of data and how we think of processing data. And i think think that's the hardest part for many people. It can start as simply as understanding why "for" in XQuery (and "for-each" in XSLT) isn't a loop. There might be some mileage in a document, "XQuery for SQL programmers" :)

liam

-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org

Markus Elfring

1:51 p.m.

...

Here's one thing you may be asking - do you want to know how to specify a join for n sources?

Yes. ‒ Your enquiry points into a direction for which I am looking also for further solution ideas.

...

for $e in $eng for $d in $deu for $u in $ukr for $h in $heb where $e/@n = $d/@n and $e/@n = $u/@n and $e/@n = $h/@n return <wf>{ $e, $d, $u, $h }</wf>

Will any improvements become relevant for the specification of such join conditions?

Regards, Markus

Graydon

2:48 p.m.

On Tue, Apr 19, 2022 at 07:51:34PM +0200, Markus Elfring scripsit:

...

...
Here's one thing you may be asking - do you want to know how to specify a join for n sources?

Yes. ‒ Your enquiry points into a direction for which I am looking also for further solution ideas.

If you look at Jonathan's example, you may recognize that the type of the variables is element()+, that is, a sequence of one or more elements.

You might then get the idea that instead of all those equals signs, you can treat it as a grouping problem:

let $range as xs:string* := ($eng,$deu,$ukr,$heb)/@n/string() => distinct-values() => sort() (: the sort() is compulsive neatness, it is not required :)

for $index in $range return <wf>{($eng,$deu,$ukr,$heb)[@n eq $index]}</wf>

Once you recognize the grouping problem, you can go "wait, isn't there a clause for that?"

for $ref in ($eng,$deu,$ukr,$heb) let $key as xs:string := $ref/@n/string() group by $key return <wf>{$ref}</wf>

Note that this will work whether or not any particular language has that value of n, and you can of course define a function to go create your sequences.

It isn't always a grouping problem, but the base pattern -- treat sequences as sequences, and all shall be well -- holds broadly. That's why there's no impetus to define a specific join operator; you'd have to stuff the entire language into it to get equivalent functionality, and we already have the entire language.

-- Graydon Saunders | graydonish@gmail.com Þæs oferéode, ðisses swá mæg. -- Deor ("That passed, so may this.")

Liam R. E. Quin

16 Apr 16 Apr

10:29 p.m.

On Sat, 2022-04-16 at 14:16 +0200, Markus Elfring wrote:

...

Hello,

It is supported to specify a fixed number of information sources for a for clause. https://www.w3.org/TR/xquery-31/#id-xquery-for-clause

Get the "issn" attribute from every document in a sequence of arbitrary length:

for $source in $sequence-of-URIs return $doc($source)/*/*:@issn

...

So you can have an arbitrary number.

Do you have a specific use case? It's software, it can do anything [1] :-)

liam

[1] and if it can't, Christian will fix it in the next snapshot :) :)

Markus Elfring

17 Apr 17 Apr

3 a.m.

...

Get the "issn" attribute from every document in a sequence of arbitrary length:

for $source in $sequence-of-URIs return $doc($source)/*/*:@issn

So you can have an arbitrary number.

It seems that your imaginations are evolving in other directions than the general data processing area I am trying to clarify here.

...

Do you have a specific use case?

The corresponding understanding is still evolving, isn't it?

The structured query language supports key words like the following. https://en.wikipedia.org/wiki/SQL_syntax#Queries

* … JOIN … ON … * … NATURAL JOIN …

I got the impression that required join conditions need to be specified as predicates in the where clause of the programming language “XQuery” instead. https://www.w3.org/TR/xquery-31/#id-where

Did you occasionally specify join conditions for the desired combination of more than three (or ten) binding sequences (or “tables”) according to a for clause?

Regards, Markus

1186

Age (days ago)

1189

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

32 comments

5 participants

tags (0)

participants (5)

Christian Grün
Graydon
Jonathan Robie
Liam R. E. Quin
Markus Elfring