Hi,
if you are just interested in the count of "yes" or "no", you could also try the function index:facets("db", "flat").
-- Andreas
Am 01.10.2012 um 16:45 schrieb Mahlow Cerstin:
Hi,
I'm trying to understand why my Basex application is slow.
I have a database looking like this:
<collection> <entry time="2012-03-04T17:43:29"> <node>4119300</node> <query>[text() contains text ('Bank' ftand 'fallen') using stemming using language "de" distance at most 6 words ordered]</query> <person>marcel</person> <phraseme>Ad0032</phraseme> <selected>no</selected> </entry> <entry time="2012-03-04T17:43:29"> <node>11150403</node> <query>[text() contains text ('Bank' ftand 'fallen') using stemming using language "de" distance at most 6 words ordered]</query> <person>marcel</person> <phraseme>Ad0032</phraseme> <selected>no</selected> </entry> <entry time="2012-03-04T17:43:29"> <node>17335179</node> <query>[text() contains text ('Bank' ftand 'fallen') using stemming using language "de" distance at most 6 words ordered]</query> <person>marcel</person> <phraseme>Ad0032</phraseme> <selected>yes</selected> </entry> </collection>
It consists of 97500 entries stored in the database "collect", one third has "yes" as value for <selected>, the other two third have "no". The number of entries will probably double over time.
I use a CGI script to produce a HTML page first listing the total number of "yes" entries and the total number of distinct phrasemes, and then listing all entries sorted by <phraseme> where <selected> is "yes" in a table with 4 columns (phraseme, distinct persons, number of entries with this phraseme, link to another CGI-Perl script). Additionally, after the table I show the last timestamp for an entry with <selected> "yes".
I use this for controlling purposes, to track progress of the use of the actual Basex search application.
I put the relevant CGI code at the bottom. It is not that complex, but it takes 80 to 90 seconds. Which is much to slow! Skipping the timestamp information does not improve the speed.
Do you have an idea how to improve this? Is the slow processing due to badly constructed XQueries, due to rendering as HTML table, due to server issues (I have a virtual server, but I don't know who else is using it for what)?
my $session = Session->new("localhost", 1984, "admin", "admin"); $session->execute("open collect"); my $evidencecount = $session->execute("xquery let $results := //selected[text() = 'yes'] return <b>{count($results)}</b>"); my @phrasemes = sort split(/\s+/, $session->execute("xquery distinct-values(//entry/phraseme/text())")); $session->close; my $phrasemecount = $#phrasemes + 1; print "<p> <b>$phrasemecount</b> accessed phrasemes with a total of $evidencecount hits</p>"; print "<table>"; print "<tr><th>Phraseme-ID</th> <th>Person</th><th>Count</th></tr>";
my $query =<<EOF; for $phraseme in distinct-values(//entry/phraseme) let $nodes := //phraseme[text() = $phraseme] let $count := count($nodes[../selected[text() = "yes"]]) let $person := distinct-values($nodes/../person) order by $phraseme return
<tr><td>({\$phraseme})</td> <td>{\$person}</td> <td>{\$count}</td> <td><a href="basex-show-phraseme.pl?phraseme={\$phraseme}">show</a></td></tr> EOF
my $viewsession = Session->new("localhost", 1984, "admin", "admin"); $viewsession->execute("open collect"); my $xquery = $viewsession->query($query); print $xquery->execute(); $xquery->close(); print "</table>";
# display last timestamp my $timequery = <<EOF; let $i := //entry/@time order by $i/@time ascending return
<p>Last access: {data(\$i[last()])}</p> EOF
my $xtimequery = $viewsession->query($timequery); print $xtimequery->execute(); $xtimequery->close(); $viewsession->close();
-- Dr. phil. Cerstin Mahlow
Universität Basel Deutsches Seminar Nadelberg 4 4051 Basel Schweiz
Tel: +41 61 267 07 65 Fax: +41 61 267 34 40 Mail: cerstin.mahlow@unibas.ch Web: http://www.oldphras.net
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk