Hello,
I’ve been debugging a strange issue on a system and I can’t quite understand what’s causing it.
I create some databases on my local laptop (macOS). They provide master data for a larger system so I then copy them to a cloud server (Linode/Amazon).
I recently switched updated some queries to use db:open with a path restriction to make them run faster.
Everything worked on the laptop but nothing was returned on the server for the same paths.
If I optimise(all) the database on the server it then works.
After the optimisation, on the server there is a file called pth.basex and a file called idp.basex.
On the Mac there is NO file in the database folder called pth.basex or idp.basex.
I’m not sure what pth.basex or idp.basex are as they’re not referenced on Storage Layout[1]
This is copying the database files directly. I have had mixed results using backup zip files - sometimes it works and sometimes it fails.
Has anyone else observed this behaviour? What are the steps I should take to ensure reliable movement of databases? What are these mysterious files?
I’m happy to update the Wiki with any new information .
Many thanks, James
[1] https://docs.basex.org/wiki/Storage_Layout https://docs.basex.org/wiki/Storage_Layout
Hi James,
Finally some feedback:
Database paths are looked up differently on Windows/Mac and UNIX-based platforms: On the first ones, the lookup is case insensitive; on the latter ones, case matters. As a result, it may happen that path lookups will fail on UNIX/Linux systems (it shouldn’t happen in the other direction). Could you check if your document paths – which you e.g. check via db:list($db) – and your path strings are exactly matching?
The pth.basex file contains an index for documents paths. It is created when database path lookups are performed for the first time. The reason for the deferred index generation is that we have use cases in which millions of documents are stored in a database, and the document path is never requested.
The idp.basex file is created if incremental indexing (UPDINDEX) is enabled. "idp" stands for "id/pre mapping": It is used to quickly look up the pre value for a database node id.
Thanks for your offer to adopt this information in the Wiki. If my hints on what’s happening are too stingy, I’ll be glad to answer more questions.
Cheers, Christian
I’ve been debugging a strange issue on a system and I can’t quite understand what’s causing it. I create some databases on my local laptop (macOS). They provide master data for a larger system so I then copy them to a cloud server (Linode/Amazon). I recently switched updated some queries to use db:open with a path restriction to make them run faster. Everything worked on the laptop but nothing was returned on the server for the same paths. If I optimise(all) the database on the server it then works.
After the optimisation, on the server there is a file called pth.basex and a file called idp.basex. On the Mac there is NO file in the database folder called pth.basex or idp.basex. I’m not sure what pth.basex or idp.basex are as they’re not referenced on Storage Layout[1] This is copying the database files directly. I have had mixed results using backup zip files - sometimes it works and sometimes it fails.
Has anyone else observed this behaviour? What are the steps I should take to ensure reliable movement of databases? What are these mysterious files? I’m happy to update the Wiki with any new information . Many thanks, James
Hi Christian,
That’s it! And explains why I’ve not seen it before.
In the databases I’ve copied before where I use paths all the paths are UUIDs and so all lowercase - so pth.basex is identical on both systems.
I realised the latest database I’ve been copying has mixed case paths. So the Mac and UNIX versions of pth.basex differ. The UNIX system finds the path index but doesn’t find the documents at those paths and can lead to some odd behaviour. Deleting pth.basex and letting BaseX recreate it on next use solves the issue - as does a full optimize.
I have added some information to the Wiki to explain the pth and idp files - including a warning about copying pth.basex between systems. Hopefully I’ve understood their functions correctly.
Thanks again for your help
James
On 4 Dec 2020, at 12:35, Christian Grün christian.gruen@gmail.com wrote:
Hi James,
Finally some feedback:
Database paths are looked up differently on Windows/Mac and UNIX-based platforms: On the first ones, the lookup is case insensitive; on the latter ones, case matters. As a result, it may happen that path lookups will fail on UNIX/Linux systems (it shouldn’t happen in the other direction). Could you check if your document paths – which you e.g. check via db:list($db) – and your path strings are exactly matching?
The pth.basex file contains an index for documents paths. It is created when database path lookups are performed for the first time. The reason for the deferred index generation is that we have use cases in which millions of documents are stored in a database, and the document path is never requested.
The idp.basex file is created if incremental indexing (UPDINDEX) is enabled. "idp" stands for "id/pre mapping": It is used to quickly look up the pre value for a database node id.
Thanks for your offer to adopt this information in the Wiki. If my hints on what’s happening are too stingy, I’ll be glad to answer more questions.
Cheers, Christian
I’ve been debugging a strange issue on a system and I can’t quite understand what’s causing it. I create some databases on my local laptop (macOS). They provide master data for a larger system so I then copy them to a cloud server (Linode/Amazon). I recently switched updated some queries to use db:open with a path restriction to make them run faster. Everything worked on the laptop but nothing was returned on the server for the same paths. If I optimise(all) the database on the server it then works.
After the optimisation, on the server there is a file called pth.basex and a file called idp.basex. On the Mac there is NO file in the database folder called pth.basex or idp.basex. I’m not sure what pth.basex or idp.basex are as they’re not referenced on Storage Layout[1] This is copying the database files directly. I have had mixed results using backup zip files - sometimes it works and sometimes it fails.
Has anyone else observed this behaviour? What are the steps I should take to ensure reliable movement of databases? What are these mysterious files? I’m happy to update the Wiki with any new information . Many thanks, James
…perfectly summarized, thanks!
James Ball basex-talk@jamesball.co.uk schrieb am So., 13. Dez. 2020, 00:06:
Hi Christian,
That’s it! And explains why I’ve not seen it before.
In the databases I’ve copied before where I use paths all the paths are UUIDs and so all lowercase - so pth.basex is identical on both systems.
I realised the latest database I’ve been copying has mixed case paths. So the Mac and UNIX versions of pth.basex differ. The UNIX system finds the path index but doesn’t find the documents at those paths and can lead to some odd behaviour. Deleting pth.basex and letting BaseX recreate it on next use solves the issue - as does a full optimize.
I have added some information to the Wiki to explain the pth and idp files
- including a warning about copying pth.basex between systems. Hopefully
I’ve understood their functions correctly.
Thanks again for your help
James
On 4 Dec 2020, at 12:35, Christian Grün christian.gruen@gmail.com
wrote:
Hi James,
Finally some feedback:
Database paths are looked up differently on Windows/Mac and UNIX-based platforms: On the first ones, the lookup is case insensitive; on the latter ones, case matters. As a result, it may happen that path lookups will fail on UNIX/Linux systems (it shouldn’t happen in the other direction). Could you check if your document paths – which you e.g. check via db:list($db) – and your path strings are exactly matching?
The pth.basex file contains an index for documents paths. It is created when database path lookups are performed for the first time. The reason for the deferred index generation is that we have use cases in which millions of documents are stored in a database, and the document path is never requested.
The idp.basex file is created if incremental indexing (UPDINDEX) is enabled. "idp" stands for "id/pre mapping": It is used to quickly look up the pre value for a database node id.
Thanks for your offer to adopt this information in the Wiki. If my hints on what’s happening are too stingy, I’ll be glad to answer more questions.
Cheers, Christian
I’ve been debugging a strange issue on a system and I can’t quite
understand what’s causing it.
I create some databases on my local laptop (macOS). They provide master
data for a larger system so I then copy them to a cloud server (Linode/Amazon).
I recently switched updated some queries to use db:open with a path
restriction to make them run faster.
Everything worked on the laptop but nothing was returned on the server
for the same paths.
If I optimise(all) the database on the server it then works.
After the optimisation, on the server there is a file called pth.basex
and a file called idp.basex.
On the Mac there is NO file in the database folder called pth.basex or
idp.basex.
I’m not sure what pth.basex or idp.basex are as they’re not referenced
on Storage Layout[1]
This is copying the database files directly. I have had mixed results
using backup zip files - sometimes it works and sometimes it fails.
Has anyone else observed this behaviour? What are the steps I should
take to ensure reliable movement of databases? What are these mysterious files?
I’m happy to update the Wiki with any new information . Many thanks, James
basex-talk@mailman.uni-konstanz.de