We are
testing basex to use with a huge database, and we have a performance problem.
At the
moment we have two small databases with thousands of documents to test:
·
Authors,
2.000 documents, 2MBs aprox.
·
Books
(3 per author) with 6.000 documents, 5MBs aprox.
We need
join and group results, and this takes a long time:
Query: <result>{for $Author in
(db:open('Authors')//businessEvent), $Book in (db:open('Books')//businessEvent)
for $income in distinct-values($Author//author/country) where
$Author//author/idauthor=$Book//book/idauthor group by $income return
<line><label>{ $income }</label><data>count($Book
)</data></line>}</result>
Compiling:
- rewriting where clause to predicate(s)
- pre-evaluating db:open("Authors")
- simplifying descendant-or-self step(s)
- pre-evaluating db:open("Books")
- simplifying descendant-or-self step(s)
- simplifying descendant-or-self step(s)
- swapping operands:
descendant::*:book/*:idauthor = $Author/descendant::author/idauthor
- simplifying descendant-or-self step(s)
- simplifying descendant-or-self step(s)
Result: element result { for $Author in
(document-node { "1" }, document-node { "2" },
...)/descendant::*:businessEvent for $Book in (document-node { "1-1"
}, document-node { "1-2" }, ...)/descendant::*:businessEvent[descendant::*:book/*:idauthor
= $Author/descendant::author/idauthor] for $income in
fn:distinct-values($Author/descendant::author/country) group by $income :=
$income return element line { element label { $income }, element data {
"count($Book )" } } }
Timing:
- Parsing: 1.01 ms
-
Compiling: 3.99 ms
-
Evaluating: 100725.38 ms
-
Printing: 0.37 ms
- Total
Time: 100730.77 ms
We have no
idea why it’s taking so long time to show the results.
If we don’t
group results it takes a long time too.
Regards,
Nuria