That's good to hear. My rewritten query was based on the query of your first post, and I already guessed that all the nested loops are not really wanted or required.

Looking forward to learning about your next insights,
Christian



Bill Osmond <bill@with.in> schrieb am Sa., 22. Aug. 2020, 16:31:
Great e-mail messages to wake up to! Thank you for the further explanation Liam, and Christian the examples you provided were considerably faster:

- my fastest was 70k ms
- your ddex.xq was 35kms
- your ddex2.xq was 10kms!

There is only one issue: both ddex.xq and ddex2.xq seem to return many more results than expected (cartesian product somewhere perhaps)

When I run the queries against a smaller database - one with just 6 of the DDEX documents, my query returns 70 results which matches the number of TrackReleases, but both ddex.xq and ddex2.xq return 303,134 results. It looks like a separate "copy" of the output is being created for every Party in the PartyList, when really there should be only one (specified by the PartyReference). But this is very promising - if it takes 10 seconds to return a massively expanded version of the data, then perhaps this will get to <1000ms!

On Sat, Aug 22, 2020 at 4:07 AM Christian Grün <christian.gruen@gmail.com> wrote:
Hi Bill,

Feel free to run the attached queries; maybe they give you a faster result.

Your use case was interesting. It gave me some additional ideas on how
to speed up queries (by reordering consecutive 'for' clauses that do
not change the result).

Cheers,
Christian


On Sat, Aug 22, 2020 at 6:10 AM Liam R. E. Quin <liam@fromoldbooks.org> wrote:
>
> On Fri, 2020-08-21 at 17:28 -0700, Bill Osmond wrote:
> > I'm beginning to think that perhaps my performance hopes were a bit
> > too
> > inflated, given the size and complexity of our database. After a
> > fresh
> > optimization, and with -Xms2g -Xmx10g, the following query takes
> > 1492ms:
>
> [...]
>
> First note - there are in fact no loops in your query. Although "for"
> is used to introduce a loop in many procedural languages, it does nto
> do so in XQuery (nor does for-each in XSLT).
>
> In fact, it's closer to what SQL people know as a join.
>
> It's making a stream of n-tuples, and then evaluating the inner
> expression for each tuple, so that
>
> for $a in (  'a', 'b', 'c')
>   for $b in (1 to 5)
>     return $a || '-' || $b
>
> produces 15 lines of output,
> a-1, a-2, 1-3, a-4, a-6, b-1, and so on.
>
> You can see the BaseX query plan for your query already moves your
> where clauses as i did by hand, because BaseX is awesome.
>
> To make the query fast, you either need to reduce the number of tuples,
> and henve the number of times the expressions are evaluated, or you
> need to reduce the cost of creating the tuples.
>
> Moving the where clauses was my attempt to reduce the number of tuples.
> Adding an index might reduce the cost of making the tuples, so i'd
> certainly try that.
>
> If the input document is sorted, you might be able to construct
> something recursively (e.g. with fold-left) or use grouping or
> windowing to process $parties in groups, which may help considerably.
>
> Without seeing the data, that's only a guess.
>
> Liam
>
> --
> Liam Quin, https://www.delightfulcomputing.com/
> Available for XML/Document/Information Architecture/XSLT/
> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
> Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org
>