On 14.02.2022 15:53, Ben Engbers wrote:
Hi,
I have a collection of 740 XML-documents which I want to flatten. The files all have the same structure:
<handeling id="h_1"> <datum date="d_1"/> <spreekbeurt><spreker>spreker_1</spreker></spreekbeurt> <spreekbeurt><spreker>spreker_3</spreker></spreekbeurt> </handeling>
<handeling id="h_2"> <datum date="d_2"/> <spreekbeurt> <spreker>spreker_2</spreker></spreekbeurt><spreekbeurt> <spreker>spreker_1</spreker><spreekbeurt> <spreker>spreker_4</spreker></spreekbeurt> </handeling>
<handeling id="h_3"> <datum date="d_3"/> <spreekbeurt><spreker>spreker_2</spreker></spreekbeurt> <spreekbeurt><spreker>spreker_3</spreker></spreekbeurt> <spreekbeurt><spreker>spreker_2</spreker></spreekbeurt> <spreekbeurt><spreker>spreker_1</spreker></spreekbeurt> </handeling>
The following query gives this result: import module namespace functx = "http://www.functx.com";
let $Blogs := collection("Blog") let $Turns := collection("Blog")
for $Blog in collection("Blog"), $Turn in collection("Blog") where $Turn//datum/@date = $Blog//datum/@date order by $Blog//datum/@date count $Count let $Id := $Blog/handeling/@id let $Datum := $Blog//datum/@date
let $Speaker := $Turn//spreker/text()
return($Id, $Datum, $Speaker, $Count)
id="h_1" date="d_1" spreker_1 spreker_3 1 id="h_2" date="d_2" spreker_2 spreker_1 spreker_4 2 id="h_3" date="d_3" spreker_2 spreker_3 spreker_2 spreker_1 3
But what I eventually need is this (for clarity shown as a table):
1, id="h_1", date="d_1", 1, spreker_1 1, id="h_1", date="d_1", 2, spreker_3 2, id="h_2", date="d_2", 1, spreker_2 2, id="h_2", date="d_2", 2, spreker_1 2, id="h_2", date="d_2", 3, spreker_4 3, id="h_3", date="d_3", 1, spreker_2 3, id="h_3", date="d_3", 2, spreker_3 3, id="h_3", date="d_3", 3, spreker_2 3, id="h_3", date="d_3", 4, spreker_1
It seem you want to process each document to generate that first number (e.g. 1, 2, 3) and then "inside" your want to process each "spreker" element.
Your shown example data doesn't seem to suggest you need to use any joins on the two collections as you seem to process the same collection("Blog") anyway so why does
for $Blog in collection("Blog") order by $Blog//datum/@date let $Id := $Blog/handeling/@id let $Datum := $Blog//datum/@date count $countOuter
for $speaker at $pos in $Blog//spreker/string()
return ($countOuter, $Id, $Datum, $pos, $speaker)
suffice, or, as a "table",
string-join( for $Blog in collection("Blog") order by $Blog//datum/@date let $Id := $Blog/handeling/@id let $Datum := $Blog//datum/@date count $countOuter
for $speaker at $pos in $Blog//spreker/string()
return string-join(($countOuter, $Id, $Datum, $pos, $speaker), ', ') , ' ')