Hi all -
An interesting XQuery question came up on reddit[1] over the weekend, and I'm curious about a couple of aspects of the problem.
To recreate, the sample data is at timecenters[2] in the employeeTemporalDataset.zip download (specifically, the `departments.xml` and the three compressed directories of XML in `employees.tar.gz`). I created a database from employees.tar.gz (thanks for letting us parse XML in archives!), added the `departments.xml`, and then `Optimized All`.
The initial query was
```
for $emp in /employees/employee[@tend = '9999-01-01']
let $curdept := $emp/deptno[@tend = '9999-01-01']
return
$emp/lastname || " " || $curdept || " " || /departments/department[deptno = $curdept]/deptname
```
and it is slow (~100 minutes, 5999201.28 ms). The original poster came back later with a modified query that is significantly faster[3], but I was mostly wondering about the whys of the slowness. I think (but am not sure) that this is a join, and since the initial `for` binding ($emp) is pretty big (~240K), the processor has to parse through the $emp result sequence and match to values in the /departments part of the database; is that a correct assumption?
Again, I'm assuming that in the second, faster, query, the processor has two sequences ($d and $e) and is able to pull the joined data together much more quickly?
Thanks in advance for any light you can shed on these questions.
Best,
Bridger
[3] improved query:
```
for $d in /departments/department
for $e in /employees/employee[deptno[@tend='9999-01-01'] = $d/deptno]
return
$e/lastname || " " || $d/deptno || " " || $d/deptname
```