We are testing basex to use with a huge database, and we have a performance problem.


At the moment we have two small databases with thousands of documents to test:


·         Authors, 2.000 documents, 2MBs aprox.

·         Books (3 per author) with 6.000 documents, 5MBs aprox.


We need join and group results, and this takes a long time:


Query: <result>{for $Author in (db:open('Authors')//businessEvent), $Book in (db:open('Books')//businessEvent) for $income in distinct-values($Author//author/country) where $Author//author/idauthor=$Book//book/idauthor group by $income return <line><label>{ $income }</label><data>count($Book )</data></line>}</result>


Compiling:

- rewriting where clause to predicate(s)

- pre-evaluating db:open("Authors")

- simplifying descendant-or-self step(s)

- pre-evaluating db:open("Books")

- simplifying descendant-or-self step(s)

- simplifying descendant-or-self step(s)

- swapping operands: descendant::*:book/*:idauthor = $Author/descendant::author/idauthor

- simplifying descendant-or-self step(s)

- simplifying descendant-or-self step(s)


Result: element result { for $Author in (document-node { "1" }, document-node { "2" }, ...)/descendant::*:businessEvent for $Book in (document-node { "1-1" }, document-node { "1-2" }, ...)/descendant::*:businessEvent[descendant::*:book/*:idauthor = $Author/descendant::author/idauthor] for $income in fn:distinct-values($Author/descendant::author/country) group by $income := $income return element line { element label { $income }, element data { "count($Book )" } } }


Timing:

 - Parsing:  1.01 ms

 - Compiling:  3.99 ms

 - Evaluating:  100725.38 ms

 - Printing:  0.37 ms

 - Total Time:  100730.77 ms


We have no idea why it’s taking so long time to show the results.


If we don’t group results it takes a long time too.


Regards,

Nuria