We are testing basex to use with a huge database, and we have a performance problem.
At the moment we have two small databases with thousands of documents to test:
· Authors, 2.000 documents, 2MBs aprox.
· Books (3 per author) with 6.000 documents, 5MBs aprox.
We need join and group results, and this takes a long time:
Query: <result>{for $Author in (db:open('Authors')//businessEvent), $Book in (db:open('Books')//businessEvent) for $income in distinct-values($Author//author/country) where $Author//author/idauthor=$Book//book/idauthor group by $income return <line><label>{ $income }</label><data>count($Book )</data></line>}</result>
Compiling:
- rewriting where clause to predicate(s)
- pre-evaluating db:open("Authors")
- simplifying descendant-or-self step(s)
- pre-evaluating db:open("Books")
- simplifying descendant-or-self step(s)
- simplifying descendant-or-self step(s)
- swapping operands: descendant::*:book/*:idauthor = $Author/descendant::author/idauthor
- simplifying descendant-or-self step(s)
- simplifying descendant-or-self step(s)
Result: element result { for $Author in (document-node { "1" }, document-node { "2" }, ...)/descendant::*:businessEvent for $Book in (document-node { "1-1" }, document-node { "1-2" }, ...)/descendant::*:businessEvent[descendant::*:book/*:idauthor = $Author/descendant::author/idauthor] for $income in fn:distinct-values($Author/descendant::author/country) group by $income := $income return element line { element label { $income }, element data { "count($Book )" } } }
Timing:
- Parsing: 1.01 ms
- Compiling: 3.99 ms
- Evaluating: 100725.38 ms
- Printing: 0.37 ms
- Total Time: 100730.77 ms
We have no idea why it’s taking so long time to show the results.
If we don’t group results it takes a long time too.
Regards,
Nuria