Copying databases between servers: pth.basex?

List overview All Threads
Download

newer

older

Slow BaseX cold performance

Performance: first and second...

James Ball

18 Nov 2020 18 Nov '20

7:43 p.m.

Hello,

I’ve been debugging a strange issue on a system and I can’t quite understand what’s causing it.

I create some databases on my local laptop (macOS). They provide master data for a larger system so I then copy them to a cloud server (Linode/Amazon).

I recently switched updated some queries to use db:open with a path restriction to make them run faster.

Everything worked on the laptop but nothing was returned on the server for the same paths.

If I optimise(all) the database on the server it then works.

After the optimisation, on the server there is a file called pth.basex and a file called idp.basex.

On the Mac there is NO file in the database folder called pth.basex or idp.basex.

I’m not sure what pth.basex or idp.basex are as they’re not referenced on Storage Layout[1]

This is copying the database files directly. I have had mixed results using backup zip files - sometimes it works and sometimes it fails.

Has anyone else observed this behaviour? What are the steps I should take to ensure reliable movement of databases? What are these mysterious files?

I’m happy to update the Wiki with any new information .

Many thanks, James

[1] https://docs.basex.org/wiki/Storage_Layout https://docs.basex.org/wiki/Storage_Layout

Attachments:

attachment.html (text/html — 1.8 KB)

Show replies by date

Christian Grün

4 Dec 4 Dec

7:35 a.m.

Hi James,

Finally some feedback:

Database paths are looked up differently on Windows/Mac and UNIX-based platforms: On the first ones, the lookup is case insensitive; on the latter ones, case matters. As a result, it may happen that path lookups will fail on UNIX/Linux systems (it shouldn’t happen in the other direction). Could you check if your document paths – which you e.g. check via db:list($db) – and your path strings are exactly matching?

The pth.basex file contains an index for documents paths. It is created when database path lookups are performed for the first time. The reason for the deferred index generation is that we have use cases in which millions of documents are stored in a database, and the document path is never requested.

The idp.basex file is created if incremental indexing (UPDINDEX) is enabled. "idp" stands for "id/pre mapping": It is used to quickly look up the pre value for a database node id.

Thanks for your offer to adopt this information in the Wiki. If my hints on what’s happening are too stingy, I’ll be glad to answer more questions.

Cheers, Christian

...

I’ve been debugging a strange issue on a system and I can’t quite understand what’s causing it. I create some databases on my local laptop (macOS). They provide master data for a larger system so I then copy them to a cloud server (Linode/Amazon). I recently switched updated some queries to use db:open with a path restriction to make them run faster. Everything worked on the laptop but nothing was returned on the server for the same paths. If I optimise(all) the database on the server it then works.

After the optimisation, on the server there is a file called pth.basex and a file called idp.basex. On the Mac there is NO file in the database folder called pth.basex or idp.basex. I’m not sure what pth.basex or idp.basex are as they’re not referenced on Storage Layout[1] This is copying the database files directly. I have had mixed results using backup zip files - sometimes it works and sometimes it fails.

Has anyone else observed this behaviour? What are the steps I should take to ensure reliable movement of databases? What are these mysterious files? I’m happy to update the Wiki with any new information . Many thanks, James

[1] https://docs.basex.org/wiki/Storage_Layout

James Ball

12 Dec 12 Dec

6:06 p.m.

Hi Christian,

That’s it! And explains why I’ve not seen it before.

In the databases I’ve copied before where I use paths all the paths are UUIDs and so all lowercase - so pth.basex is identical on both systems.

I realised the latest database I’ve been copying has mixed case paths. So the Mac and UNIX versions of pth.basex differ. The UNIX system finds the path index but doesn’t find the documents at those paths and can lead to some odd behaviour. Deleting pth.basex and letting BaseX recreate it on next use solves the issue - as does a full optimize.

I have added some information to the Wiki to explain the pth and idp files - including a warning about copying pth.basex between systems. Hopefully I’ve understood their functions correctly.

Thanks again for your help

James

...

On 4 Dec 2020, at 12:35, Christian Grün christian.gruen@gmail.com wrote:

Hi James,

Finally some feedback:

Database paths are looked up differently on Windows/Mac and UNIX-based platforms: On the first ones, the lookup is case insensitive; on the latter ones, case matters. As a result, it may happen that path lookups will fail on UNIX/Linux systems (it shouldn’t happen in the other direction). Could you check if your document paths – which you e.g. check via db:list($db) – and your path strings are exactly matching?

The pth.basex file contains an index for documents paths. It is created when database path lookups are performed for the first time. The reason for the deferred index generation is that we have use cases in which millions of documents are stored in a database, and the document path is never requested.

The idp.basex file is created if incremental indexing (UPDINDEX) is enabled. "idp" stands for "id/pre mapping": It is used to quickly look up the pre value for a database node id.

Thanks for your offer to adopt this information in the Wiki. If my hints on what’s happening are too stingy, I’ll be glad to answer more questions.

Cheers, Christian

...
I’ve been debugging a strange issue on a system and I can’t quite understand what’s causing it. I create some databases on my local laptop (macOS). They provide master data for a larger system so I then copy them to a cloud server (Linode/Amazon). I recently switched updated some queries to use db:open with a path restriction to make them run faster. Everything worked on the laptop but nothing was returned on the server for the same paths. If I optimise(all) the database on the server it then works.

After the optimisation, on the server there is a file called pth.basex and a file called idp.basex. On the Mac there is NO file in the database folder called pth.basex or idp.basex. I’m not sure what pth.basex or idp.basex are as they’re not referenced on Storage Layout[1] This is copying the database files directly. I have had mixed results using backup zip files - sometimes it works and sometimes it fails.

Has anyone else observed this behaviour? What are the steps I should take to ensure reliable movement of databases? What are these mysterious files? I’m happy to update the Wiki with any new information . Many thanks, James

[1] https://docs.basex.org/wiki/Storage_Layout

Christian Grün

7:30 p.m.

…perfectly summarized, thanks!

James Ball basex-talk@jamesball.co.uk schrieb am So., 13. Dez. 2020, 00:06:

...

Hi Christian,

That’s it! And explains why I’ve not seen it before.

In the databases I’ve copied before where I use paths all the paths are UUIDs and so all lowercase - so pth.basex is identical on both systems.

I realised the latest database I’ve been copying has mixed case paths. So the Mac and UNIX versions of pth.basex differ. The UNIX system finds the path index but doesn’t find the documents at those paths and can lead to some odd behaviour. Deleting pth.basex and letting BaseX recreate it on next use solves the issue - as does a full optimize.

I have added some information to the Wiki to explain the pth and idp files

including a warning about copying pth.basex between systems. Hopefully

I’ve understood their functions correctly.

Thanks again for your help

James

...
On 4 Dec 2020, at 12:35, Christian Grün christian.gruen@gmail.com

wrote:

...
Hi James,

Finally some feedback:

Database paths are looked up differently on Windows/Mac and UNIX-based platforms: On the first ones, the lookup is case insensitive; on the latter ones, case matters. As a result, it may happen that path lookups will fail on UNIX/Linux systems (it shouldn’t happen in the other direction). Could you check if your document paths – which you e.g. check via db:list($db) – and your path strings are exactly matching?

The pth.basex file contains an index for documents paths. It is created when database path lookups are performed for the first time. The reason for the deferred index generation is that we have use cases in which millions of documents are stored in a database, and the document path is never requested.

The idp.basex file is created if incremental indexing (UPDINDEX) is enabled. "idp" stands for "id/pre mapping": It is used to quickly look up the pre value for a database node id.

Thanks for your offer to adopt this information in the Wiki. If my hints on what’s happening are too stingy, I’ll be glad to answer more questions.

Cheers, Christian

...
I’ve been debugging a strange issue on a system and I can’t quite

understand what’s causing it.

...
...
I create some databases on my local laptop (macOS). They provide master

data for a larger system so I then copy them to a cloud server (Linode/Amazon).

...
...
I recently switched updated some queries to use db:open with a path

restriction to make them run faster.

...
...
Everything worked on the laptop but nothing was returned on the server

for the same paths.

...
...
If I optimise(all) the database on the server it then works.

After the optimisation, on the server there is a file called pth.basex

and a file called idp.basex.

...
...
On the Mac there is NO file in the database folder called pth.basex or

idp.basex.

...
...
I’m not sure what pth.basex or idp.basex are as they’re not referenced

on Storage Layout[1]

...
...
This is copying the database files directly. I have had mixed results

using backup zip files - sometimes it works and sometimes it fails.

...
...
Has anyone else observed this behaviour? What are the steps I should

take to ensure reliable movement of databases? What are these mysterious files?

...
...
I’m happy to update the Wiki with any new information . Many thanks, James

[1] https://docs.basex.org/wiki/Storage_Layout

1677

Age (days ago)

1701

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

3 comments

2 participants

tags (0)

participants (2)

Christian Grün
James Ball