Hey Christian,

To follow up about why I think it may be a bug, if you look at the org.basex.http.RestXQModules class, it seems like many threads are blocking in the cache method, which is synchronized, while a thread is busy reparsing the files, though none of them changed. It makes sense that these results should be cached, I just can't imagine why the reparsing is happening more than once, even if the files haven't changed. The thread dumps show several threads blocked in the cache method, and the thread holding the lock is busy parsing the files.

Sorry for not being a bit clearer in my previous response,

Jeremy

On Thu, Dec 12, 2013 at 12:50 PM, Jeremy Moseley <jeremy@inkling.com> wrote:

Hey Christian,
It looks like a bug simply because it seems like all (perhaps only some?) threads are reparsing the RESTXQ files, even when the modified time has not changed, and other threads are blocked on the cache object that holds the results of the parse.

When I run the query by itself, average time is anywhere from 300 ms - 800 ms, but when I run concurrently, things slow way down. Below is the percentile speeds output by Apache Benchmark:
0% 9316
66% 10805
75% 11372
80% 11835
90% 13794
95% 15221
98% 16051
99% 16130
100% 16130 (longest request)

and this is the output from $ time curl "localhost:8984/xmldb/request" > /dev/null

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
112 448 112 448 0 0 802 0 --:--:-- --:--:-- --:--:-- 802

real 0m0.567s
user 0m0.010s
sys 0m0.000s

Both Apache Benchmark, and the curl command are requesting the same URL.

The queries themselves involve looping over all href nodes in the documents, using the db:attribute function to lookup other nodes based on the value of this, and then formatting the information into JSON, and output to the client. I believe the queries are as fast as they are going to be, and a single query runs fast enough for our needs.

Any thoughts?

On Dec 12, 2013 12:33 PM, "Christian Grün" <christian.gruen@gmail.com> wrote:

> No files in the RESTXQ folder are changing at runtime, and from looking at
> the code it appears that the result of the parsing should be cached unless
> the files change anyways, but that does not appear to be the case, at least
> from my analysis of the thread dumps. Could this possibly be a bug?

I may need to have another look into your dumps, but maybe you can
tell me faster what you think is happening here, and why you think
it’s a bug?

> Should I be getting better throughput than 0.5 - 10 requests per second?
> Generally it is 0.5 - 3 requests per second for standard size projects
> (6000+ XML files), where we are performing queries for nodes by attribute
> index, and faster 10 - 15 requests per second for smaller projects, same
> type of queries. Can you think of a way to speed this up, perhaps moving the
> Table files directly into memory, to minimize disk accesses?

So you say that one query takes a maximum ~0.5 seconds to be
evaluated. Synchronization shouldn’t be a problem when all calls are
run sequentially, so I would be interested what performance you get
when you run all queries sequentially?

Regarding the queries: I guess I asked before, but maybe there’s a
chance to speed up single queries? How do your queries look like?

Christian