Re: [basex-talk] multi-threaded XQueries

List overview All Threads
Download

newer

older

XML clob w/ SQL module

BaseX: Upcoming Features

Christian Grün

24 Mar 2014 24 Mar '14

5:06 a.m.

Hi Ying,

...

e.g. the dataset I use is DBLP, one of the example query is

for $x in db:open('dblp_2013')/dblp/article[child::pages and child::title] let $y := count($x/author) return concat($y, "/t", db:node-id($x))

Yes, I agree there is not much that can be done to speed up this query.

...

It is true that these queries are competing for the same resource. So you mean that each query causes a lot of disk I/O?

Exactly. All DBLP articles need to be parsed by this query, and > 1 million result strings will be generated, resulting in a query time of appr. 0.005 ms per result.

Hope this helps, Christian

...

However all these queries are read-only. It there any room to improve this?

...

-- Regards

Shanshan

School of Computing National University of Singapore

Show replies by date

Ying Shanshan

24 Mar 24 Mar

9:46 a.m.

New subject: multi-threaded XQueries

Hi Christian,

Thanks for your reply.

First, I am a layman of XML DB, do not have much experience of it. I am just very curious why for read-only queries, they will compete for disk I/O. As in relational database management systems, we may control the granularity of lock. Is it possible to allow multiple read thread on the same I/O, for read only queries.

It there anyway we can brutely load the whole index and then do the queries.

Thanks,

On Mon, Mar 24, 2014 at 5:06 PM, Christian Grün christian.gruen@gmail.comwrote:

...

Hi Ying,

...
e.g. the dataset I use is DBLP, one of the example query is

for $x in db:open('dblp_2013')/dblp/article[child::pages and

child::title]

...
let $y := count($x/author) return concat($y, "/t", db:node-id($x))

Yes, I agree there is not much that can be done to speed up this query.

...
It is true that these queries are competing for the same resource. So you mean that each query causes a lot of disk I/O?

Exactly. All DBLP articles need to be parsed by this query, and > 1 million result strings will be generated, resulting in a query time of appr. 0.005 ms per result.

Hope this helps, Christian

...
However all these queries are read-only. It there any room to improve

this?

...
-- Regards

Shanshan

School of Computing National University of Singapore

-- Regards Shanshan School of Computing National University of Singapore

Christian Grün

10:34 a.m.

New subject: multi-threaded XQueries

Hi Ying,

...

I am just very curious why for read-only queries, they will compete for disk I/O.

The reason is that your query requires a sequential scan of the full XML structure. If you run several requests in parallel, all requests will try to read the same blocks in a slightly delayed order. As a result, the pointer, which is reading a block on disk, will be moved again and again, and the resulting pattern is somewhat random.

...

It there anyway we can brutely load the whole index and then do the queries.

Due to the nature of your query, there is now way to benefit from index structures. Instead, I would suggest to create additional databases that contain all the information you will be frequently accessing.

Christian

Ying Shanshan

9:02 p.m.

New subject: multi-threaded XQueries

Hi Christian,

Thanks for your response.

On Mon, Mar 24, 2014 at 10:34 PM, Christian Grün christian.gruen@gmail.comwrote:

...

Hi Ying,

...
I am just very curious why for read-only queries, they will compete for

disk

...
I/O.

The reason is that your query requires a sequential scan of the full XML structure. If you run several requests in parallel, all requests will try to read the same blocks in a slightly delayed order. As a result, the pointer, which is reading a block on disk, will be moved again and again, and the resulting pattern is somewhat random.

...
It there anyway we can brutely load the whole index and then do the

queries.

Due to the nature of your query, there is now way to benefit from index structures. Instead, I would suggest to create additional databases that contain all the information you will be frequently accessing.

Christian

-- Regards Shanshan School of Computing National University of Singapore

4133

Age (days ago)

4134

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

3 comments

2 participants

tags (0)

participants (2)

Christian Grün
Ying Shanshan