Re: [basex-talk] Joining a varying number of information sources (with XQuery)?

17 Apr 2022


      On Sun, Apr 17, 2022 at 09:00:10PM +0200, Markus Elfring scripsit:
...
...
There are neither join conditions nor record sets in XQuery.
I suggest to compare this view to the situation before the key word “JOIN”
was added to the SQL standard.
https://en.wikipedia.org/wiki/Join_(SQL)
See also:
https://www.w3.org/TR/xquery-31/#id-joins
You can do join _operations_, but you aren't doing them on tables
(unless you did extra work to represent the tables hierarchically) and
there's absolutely no need for the keywords because the existing more
general mechanisms work fine.
...
How do you think about the following XQuery script sketch?
let $interesting_stuff1 as item()* := my_fn:get_data("some expression""),),
    $interesting_stuff2 as item()* := my_fn:determine_further_data(),
    $interesting_stuff3 as item()* := my_fn:evaluate_another_expression()
for $this1 in $interesting_stuff1,
    $this2 in $interesting_stuff2,
    $this3 in $interesting_stuff3
where $this1/id = $this2/id and $this2/id = $this3/id
return do_something($this1/id, $this2/description, $this3/comment)
You're asking the optimizer to turn something O(N^2) into something
efficient and you don't have to.  All of this keys on an id element's
string value and you already know that.
Use your functions to create maps where the keys come from that id
element's string value.
(: usual caveat; this is typing, it hasn't been run :)
(: this is a sequence of unique id values :)
let $intresting_stuff1 as xs:string+ := my_fun:get_data("some expression")
(: this maps id values to description elements :)
let $intresting_stuff2 as map(xs:string,element(description))
    := my_fun:determine_further_data()
(: maps id values to comment elements :)
let $intresting_stuff3 as map(xs:string,element(comment))
    := my_fun:evaluage_another_expression()
(: bind to the sequence of id values :)
for $id in $interesting_stuff1
return
    (: run the function per-id :)
    my_fun:do_something($id,$interesting_stuff2($id),$interesting_stuff3($id))
You could decide to skip the for clause and use
return $interesting_stuff1 !
my_fun:do_something(.,$interesting_stuff2(.),$interesting_stuff3(.))
instead.
You could do something similar (and conceptually simpler, but maybe not
better in practical terms) as:
(: all of the expressions need to return elements with id element
descendants where the element has a meaningful string value :)
for $data in (db:open('one')/expression_one,db:open('two')/expression_two,db:open('three')/expression_three)
(: get the first descendant id element and take its string value :)
let $id as xs:string? := $data/descendant::id[1]/string()
(: if there's no $id value, stop processing this member of the binding sequence :)
where $id
(: re-order the entire tuple stream into one tuple per distinct $id value, with however many $data variables have that id value associated with it :)
group by $id
(: after the group by clause, a reference to $data is a reference to a sequence of $data bindings, everything where this specific id value was found as the first descendent id element's string property of the elements in the binding sequence of the for clause :)
return my_fun:do_something($data)
This approach assumes my_fun:do_something() knows what it's looking for,
how to filter that out of the elements it gets passed, and how to order
it in what it returns. Because it has the actual nodes, it can tell
where they came from if it needs to.  This approach can work better
with messy data where 'two' might have comments and 'three' might have
descriptions and you'll need to either use both or add logic about which
one is preferred. (Or you functions to create the maps can be a bit
smarter.  Depends on the data.)
...
...
XQuery and SQL are not similar languages; they're both query languages,
but SQL is built on set theory while XQuery is built on graph theory
(XPath) and the idea of a tuple stream processor (FLOWR expressions).
The underlying math for XQuery is younger. For example, the data
structure under maps (finger trees) was first published _as math_ in
2006.  You can't use either to understand the other one.
Will any further comparisons evolve for the provided functionality?
Don't think so.  I find the trick with XQuery is to not fight with it
about being some other language.
Internalizing the sequence concept takes work; internalizing the "this
one, and all of them, at the same time" tuple stream processor concept
takes more work. Once you've done that, you've got an extremely powerful
and general tool.
(Rather like using git, I might put this as not trying to outsmart the
capable people who designed XQuery.  I'm going to lose if I do that.)
-- 
Graydon Saunders  | graydonish@gmail.com
Þæs oferéode, ðisses swá mæg.
-- Deor  ("That passed, so may this.")

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Joining a varying number of information sources (with XQuery)?