Re: [basex-talk] basex trouble

List overview All Threads
Download

newer

older

Duplicate files when using webDAV...

Atomization

Christian Grün

23 May 2018 23 May '18

3:54 p.m.

Hi Ветошкин (cc to the list),

Maybe we find a way to speed up your queries; you can attach them to your next mail.

I guess that the parallel query execution leads to "random access patterns" on disk. You can enforce one query at a time by setting the PARALLEL option to 1 (see [1]).

Hope this helps, Christian

[1] http://docs.basex.org/wiki/Options#PARALLEL

On Wed, May 23, 2018 at 9:40 PM, Ветошкин Владимир en-trance@yandex.ru wrote:

...

Hi, Christian!

Thank you.

I have two scripts, which make queries to one database. If I run these queries at different times, each of them completes in about 5 sec. But if I run both scripts at one time, it takes about 40 sec.

If the scripts make queries to different databases - it's ok, it takes about 5 sec.

Is it possible to improve this situation?

I tried to use this code to load database in main-memory: db:open('db1') update {} But it didn't help.

I hope, you understand me :)

23.05.2018, 20:15, "Christian Grün" christian.gruen@gmail.com:

Привет Ветошкин,

Welcome to the list. Just send your question to this address.

Best, Christian

Ветошкин Владимир en-trance@yandex.ru schrieb am Mi., 23. Mai 2018, 19:10:

Hello!

Sorry for my English. May I ask you some questions here or I have to write them on github?

-- С уважением, Ветошкин Владимир Владимирович

-- С уважением, Ветошкин Владимир Владимирович

Show replies by date

Ветошкин Владимир

24 May 24 May

3:01 a.m.

New subject: basex trouble

Hi, Christian! Thank you for your reply!

I attached the file.

I thought about "random access patterns" But all databases are on one disk. So, why two queries to different databases complete in about 5 sec? And if these queries make to one database - it takes a long time...

...

by setting the PARALLEL option to 1

It helped a bit. Now the second script is waiting for finishing the first. But I would like to see a parallel execution of these queries. Is it possible without long waiting? May be it is possible to load database (or some nodes) in memory to avoid heavy load the disk?

23.05.2018, 22:54, "Christian Grün" christian.gruen@gmail.com:

...

Hi Ветошкин (cc to the list),

Maybe we find a way to speed up your queries; you can attach them to your next mail.

I guess that the parallel query execution leads to "random access patterns" on disk. You can enforce one query at a time by setting the PARALLEL option to 1 (see [1]).

Hope this helps, Christian

[1] http://docs.basex.org/wiki/Options#PARALLEL

On Wed, May 23, 2018 at 9:40 PM, Ветошкин Владимир en-trance@yandex.ru wrote:

...
Hi, Christian!

Thank you.

I have two scripts, which make queries to one database. If I run these queries at different times, each of them completes in about 5 sec. But if I run both scripts at one time, it takes about 40 sec.

If the scripts make queries to different databases - it's ok, it takes about 5 sec.

Is it possible to improve this situation?

I tried to use this code to load database in main-memory: db:open('db1') update {} But it didn't help.

I hope, you understand me :)

23.05.2018, 20:15, "Christian Grün" christian.gruen@gmail.com:

Привет Ветошкин,

Welcome to the list. Just send your question to this address.

Best, Christian

Ветошкин Владимир en-trance@yandex.ru schrieb am Mi., 23. Mai 2018, 19:10:

Hello!

Sorry for my English. May I ask you some questions here or I have to write them on github?

-- С уважением, Ветошкин Владимир Владимирович

-- С уважением, Ветошкин Владимир Владимирович

-- С уважением, Ветошкин Владимир Владимирович

Christian Grün

6:11 a.m.

New subject: basex trouble

...

I thought about "random access patterns" But all databases are on one disk.

I agree, this sounds confusing at first sight. Some more details: If the same database is opened by multiple queries, only one instance will be opened. The same data cursor is used to read data, and if the data is accessed in parallel, this cursor will repeatedly moved, leading to adverse access patterns. This may be not so bad if different databases are accessed (in this case, we’ll have multiple cursors). But it’s also possible to write queries that show the opposite effect (slow parallel access on multiple databases, fast parallel access on single databases), so it also depends on the specific queries (if sequential scans are performed, if index access is possible, etc.) and your data sets.

In general, you benefit a lot if your queries take advantage of the available index structures (see [1] for more information). You can open the "Info View" panel in BaseX and look for hints to index rewritings. In your specific query, I noticed that $qxml and $qxmldoc point to the same node. Moreover, you it helps to use "where" clauses as early as possible, so the first part of your query could possibly be rewritten to...

for $qxml in db:open('000999') where $qxml/*:Envelope/*:Header/*:RoutingInf/*:EnvelopeID/text() = $envs let $qdoc := ... return $qxml

Hope this helps, Christian

PS: As far as I can judge, your English is fine ;)

[1] http://docs.basex.org/wiki/Indexes

...

So, why two queries to different databases complete in about 5 sec? And if these queries make to one database - it takes a long time...

...
by setting the PARALLEL option to 1

It helped a bit. Now the second script is waiting for finishing the first. But I would like to see a parallel execution of these queries. Is it possible without long waiting? May be it is possible to load database (or some nodes) in memory to avoid heavy load the disk?

23.05.2018, 22:54, "Christian Grün" christian.gruen@gmail.com:

...
Hi Ветошкин (cc to the list),

Maybe we find a way to speed up your queries; you can attach them to your next mail.

I guess that the parallel query execution leads to "random access patterns" on disk. You can enforce one query at a time by setting the PARALLEL option to 1 (see [1]).

Hope this helps, Christian

[1] http://docs.basex.org/wiki/Options#PARALLEL

On Wed, May 23, 2018 at 9:40 PM, Ветошкин Владимир en-trance@yandex.ru wrote:

...
Hi, Christian!

Thank you.

I have two scripts, which make queries to one database. If I run these queries at different times, each of them completes in about 5 sec. But if I run both scripts at one time, it takes about 40 sec.

If the scripts make queries to different databases - it's ok, it takes about 5 sec.

Is it possible to improve this situation?

I tried to use this code to load database in main-memory: db:open('db1') update {} But it didn't help.

I hope, you understand me :)

23.05.2018, 20:15, "Christian Grün" christian.gruen@gmail.com:

Привет Ветошкин,

Welcome to the list. Just send your question to this address.

Best, Christian

Ветошкин Владимир en-trance@yandex.ru schrieb am Mi., 23. Mai 2018, 19:10:

Hello!

Sorry for my English. May I ask you some questions here or I have to write them on github?

-- С уважением, Ветошкин Владимир Владимирович

-- С уважением, Ветошкин Владимир Владимирович

-- С уважением, Ветошкин Владимир Владимирович

Ветошкин Владимир

7:47 a.m.

New subject: basex trouble

Thank you Christian for your kind words :)

I rewrote queries as you said. It helped and now query completes in about 3 seconds. Anyway, it's better.

Also I added text index for this base. And there are 2 records about index in the "Info View" panel: - apply text index for $envs_49 - rewrite where clause(s) - simplify gflwor - pre-evaluate db:open("000999") to document-node() sequence - apply text index for $invenvs_51 - rewrite where clause(s) - simplify gflwor

I attached the file, which contains the full text from the panel.

24.05.2018, 13:12, "Christian Grün" christian.gruen@gmail.com:

...

...
I thought about "random access patterns" But all databases are on one disk.

I agree, this sounds confusing at first sight. Some more details: If the same database is opened by multiple queries, only one instance will be opened. The same data cursor is used to read data, and if the data is accessed in parallel, this cursor will repeatedly moved, leading to adverse access patterns. This may be not so bad if different databases are accessed (in this case, we’ll have multiple cursors). But it’s also possible to write queries that show the opposite effect (slow parallel access on multiple databases, fast parallel access on single databases), so it also depends on the specific queries (if sequential scans are performed, if index access is possible, etc.) and your data sets.

In general, you benefit a lot if your queries take advantage of the available index structures (see [1] for more information). You can open the "Info View" panel in BaseX and look for hints to index rewritings. In your specific query, I noticed that $qxml and $qxmldoc point to the same node. Moreover, you it helps to use "where" clauses as early as possible, so the first part of your query could possibly be rewritten to...

for $qxml in db:open('000999')   where $qxml/*:Envelope/*:Header/*:RoutingInf/*:EnvelopeID/text() = $envs   let $qdoc := ...   return $qxml

Hope this helps, Christian

PS: As far as I can judge, your English is fine ;)

[1] http://docs.basex.org/wiki/Indexes

...
So, why two queries to different databases complete in about 5 sec? And if these queries make to one database - it takes a long time...

...
by setting the PARALLEL option to 1

It helped a bit. Now the second script is waiting for finishing the first. But I would like to see a parallel execution of these queries. Is it possible without long waiting? May be it is possible to load database (or some nodes) in memory to avoid heavy load the disk?

23.05.2018, 22:54, "Christian Grün" christian.gruen@gmail.com:

...
Hi Ветошкин (cc to the list),

Maybe we find a way to speed up your queries; you can attach them to your next mail.

I guess that the parallel query execution leads to "random access patterns" on disk. You can enforce one query at a time by setting the PARALLEL option to 1 (see [1]).

Hope this helps, Christian

[1] http://docs.basex.org/wiki/Options#PARALLEL

On Wed, May 23, 2018 at 9:40 PM, Ветошкин Владимир en-trance@yandex.ru wrote:

...
Hi, Christian!

Thank you.

I have two scripts, which make queries to one database.   If I run these queries at different times, each of them completes in about 5   sec.   But if I run both scripts at one time, it takes about 40 sec.

If the scripts make queries to different databases - it's ok, it takes about   5 sec.

Is it possible to improve this situation?

I tried to use this code to load database in main-memory:   db:open('db1') update {}   But it didn't help.

I hope, you understand me :)

23.05.2018, 20:15, "Christian Grün" christian.gruen@gmail.com:

Привет Ветошкин,

Welcome to the list. Just send your question to this address.

Best,   Christian

Ветошкин Владимир en-trance@yandex.ru schrieb am Mi., 23. Mai 2018, 19:10:

Hello!

Sorry for my English.    May I ask you some questions here or I have to write them on github?

--    С уважением,     Ветошкин Владимир Владимирович

--   С уважением,   Ветошкин Владимир Владимирович

-- С уважением,   Ветошкин Владимир Владимирович

-- С уважением, Ветошкин Владимир Владимирович

Ветошкин Владимир

8:05 a.m.

New subject: basex trouble

Christian,

Is it possible to add index only for "EnvelopeID" since only this field is in where clause? May be it will speed up the queries..

24.05.2018, 14:47, "Ветошкин Владимир" en-trance@yandex.ru:

...

Thank you Christian for your kind words :)

I rewrote queries as you said. It helped and now query completes in about 3 seconds. Anyway, it's better.

Also I added text index for this base. And there are 2 records about index in the "Info View" panel:

apply text index for $envs_49

rewrite where clause(s)

simplify gflwor

pre-evaluate db:open("000999") to document-node() sequence

apply text index for $invenvs_51

rewrite where clause(s)

simplify gflwor

I attached the file, which contains the full text from the panel.

24.05.2018, 13:12, "Christian Grün" christian.gruen@gmail.com:

...
...
I thought about "random access patterns"   But all databases are on one disk.

I agree, this sounds confusing at first sight. Some more details: If the same database is opened by multiple queries, only one instance will be opened. The same data cursor is used to read data, and if the data is accessed in parallel, this cursor will repeatedly moved, leading to adverse access patterns. This may be not so bad if different databases are accessed (in this case, we’ll have multiple cursors). But it’s also possible to write queries that show the opposite effect (slow parallel access on multiple databases, fast parallel access on single databases), so it also depends on the specific queries (if sequential scans are performed, if index access is possible, etc.) and your data sets.

In general, you benefit a lot if your queries take advantage of the available index structures (see [1] for more information). You can open the "Info View" panel in BaseX and look for hints to index rewritings. In your specific query, I noticed that $qxml and $qxmldoc point to the same node. Moreover, you it helps to use "where" clauses as early as possible, so the first part of your query could possibly be rewritten to...

for $qxml in db:open('000999')    where $qxml/*:Envelope/*:Header/*:RoutingInf/*:EnvelopeID/text() = $envs    let $qdoc := ...    return $qxml

Hope this helps, Christian

PS: As far as I can judge, your English is fine ;)

[1] http://docs.basex.org/wiki/Indexes

...
So, why two queries to different databases complete in about 5 sec?   And if these queries make to one database - it takes a long time...

...
by setting the PARALLEL option to 1

It helped a bit. Now the second script is waiting for finishing the first.   But I would like to see a parallel execution of these queries. Is it possible without long waiting?   May be it is possible to load database (or some nodes) in memory to avoid heavy load the disk?

23.05.2018, 22:54, "Christian Grün" christian.gruen@gmail.com:

...
Hi Ветошкин (cc to the list),

Maybe we find a way to speed up your queries; you can attach them to   your next mail.

I guess that the parallel query execution leads to "random access   patterns" on disk. You can enforce one query at a time by setting the   PARALLEL option to 1 (see [1]).

Hope this helps,   Christian

[1] http://docs.basex.org/wiki/Options#PARALLEL

On Wed, May 23, 2018 at 9:40 PM, Ветошкин Владимир en-trance@yandex.ru wrote:

...
Hi, Christian!

Thank you.

I have two scripts, which make queries to one database.    If I run these queries at different times, each of them completes in about 5    sec.    But if I run both scripts at one time, it takes about 40 sec.

If the scripts make queries to different databases - it's ok, it takes about    5 sec.

Is it possible to improve this situation?

I tried to use this code to load database in main-memory:    db:open('db1') update {}    But it didn't help.

I hope, you understand me :)

23.05.2018, 20:15, "Christian Grün" christian.gruen@gmail.com:

Привет Ветошкин,

Welcome to the list. Just send your question to this address.

Best,    Christian

Ветошкин Владимир en-trance@yandex.ru schrieb am Mi., 23. Mai 2018, 19:10:

Hello!

Sorry for my English.     May I ask you some questions here or I have to write them on github?

--     С уважением,      Ветошкин Владимир Владимирович

--    С уважением,    Ветошкин Владимир Владимирович

--   С уважением,    Ветошкин Владимир Владимирович

-- С уважением, Ветошкин Владимир Владимирович

-- С уважением, Ветошкин Владимир Владимирович

Christian Grün

11:36 a.m.

New subject: basex trouble

...

Is it possible to add index only for "EnvelopeID" since only this field is in where clause?

If you know that it will always be the EnvelopeID field that you want to look up in the index, you can limit the index entries to this field (e.g. by setting TEXTINCLUDE to "*:EnvelopeID" as an option [1], or doing so via the Database creation dialog).

But probably you were asking if the index rewriting can be restricted to this element at runtime? Can you forward us your current query (ideally a minimized version)?

[1] http://docs.basex.org/wiki/Indexes#Selective_Indexing

2612

Age (days ago)

2613

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

5 comments

2 participants

tags (0)

participants (2)

Christian Grün
Ветошкин Владимир