Hi all -

An interesting XQuery question came up on reddit[1] over the weekend, and I'm curious about a couple of aspects of the problem.

To recreate, the sample data is at timecenters[2] in the employeeTemporalDataset.zip download (specifically, the `departments.xml` and the three compressed directories of XML in `employees.tar.gz`). I created a database from employees.tar.gz (thanks for letting us parse XML in archives!), added the `departments.xml`, and then `Optimized All`.

The initial query was

```

for $emp in /employees/employee[@tend = '9999-01-01']

let $curdept := $emp/deptno[@tend = '9999-01-01']

return

$emp/lastname || " " || $curdept || " " || /departments/department[deptno = $curdept]/deptname

```

and it is slow (~100 minutes, 5999201.28 ms). The original poster came back later with a modified query that is significantly faster[3], but I was mostly wondering about the whys of the slowness. I think (but am not sure) that this is a join, and since the initial `for` binding ($emp) is pretty big (~240K), the processor has to parse through the $emp result sequence and match to values in the /departments part of the database; is that a correct assumption?

Again, I'm assuming that in the second, faster, query, the processor has two sequences ($d and $e) and is able to pull the joined data together much more quickly?

Thanks in advance for any light you can shed on these questions.

Best,

Bridger

[1] https://www.reddit.com/r/xml/comments/c3mb86/simple_xquery_optimization/

[2] http://timecenter.cs.aau.dk/software.htm

[3] improved query:

```

for $d in /departments/department

for $e in /employees/employee[deptno[@tend='9999-01-01'] = $d/deptno]
return

$e/lastname || " " || $d/deptno || " " || $d/deptname
```