Hi Ying,
e.g. the dataset I use is DBLP, one of the example query is
for $x in db:open('dblp_2013')/dblp/article[child::pages and child::title] let $y := count($x/author) return concat($y, "/t", db:node-id($x))
Yes, I agree there is not much that can be done to speed up this query.
It is true that these queries are competing for the same resource. So you mean that each query causes a lot of disk I/O?
Exactly. All DBLP articles need to be parsed by this query, and > 1 million result strings will be generated, resulting in a query time of appr. 0.005 ms per result.
Hope this helps, Christian
However all these queries are read-only. It there any room to improve this?
-- Regards
Shanshan
School of Computing National University of Singapore
Hi Christian,
Thanks for your reply.
First, I am a layman of XML DB, do not have much experience of it. I am just very curious why for read-only queries, they will compete for disk I/O. As in relational database management systems, we may control the granularity of lock. Is it possible to allow multiple read thread on the same I/O, for read only queries.
It there anyway we can brutely load the whole index and then do the queries.
Thanks,
On Mon, Mar 24, 2014 at 5:06 PM, Christian Grün christian.gruen@gmail.comwrote:
Hi Ying,
e.g. the dataset I use is DBLP, one of the example query is
for $x in db:open('dblp_2013')/dblp/article[child::pages and
child::title]
let $y := count($x/author) return concat($y, "/t", db:node-id($x))
Yes, I agree there is not much that can be done to speed up this query.
It is true that these queries are competing for the same resource. So you mean that each query causes a lot of disk I/O?
Exactly. All DBLP articles need to be parsed by this query, and > 1 million result strings will be generated, resulting in a query time of appr. 0.005 ms per result.
Hope this helps, Christian
However all these queries are read-only. It there any room to improve
this?
-- Regards
Shanshan
School of Computing National University of Singapore
Hi Ying,
I am just very curious why for read-only queries, they will compete for disk I/O.
The reason is that your query requires a sequential scan of the full XML structure. If you run several requests in parallel, all requests will try to read the same blocks in a slightly delayed order. As a result, the pointer, which is reading a block on disk, will be moved again and again, and the resulting pattern is somewhat random.
It there anyway we can brutely load the whole index and then do the queries.
Due to the nature of your query, there is now way to benefit from index structures. Instead, I would suggest to create additional databases that contain all the information you will be frequently accessing.
Christian
Hi Christian,
Thanks for your response.
On Mon, Mar 24, 2014 at 10:34 PM, Christian Grün christian.gruen@gmail.comwrote:
Hi Ying,
I am just very curious why for read-only queries, they will compete for
disk
I/O.
The reason is that your query requires a sequential scan of the full XML structure. If you run several requests in parallel, all requests will try to read the same blocks in a slightly delayed order. As a result, the pointer, which is reading a block on disk, will be moved again and again, and the resulting pattern is somewhat random.
It there anyway we can brutely load the whole index and then do the
queries.
Due to the nature of your query, there is now way to benefit from index structures. Instead, I would suggest to create additional databases that contain all the information you will be frequently accessing.
Christian
basex-talk@mailman.uni-konstanz.de