bounced by the List, re-sent after subscription
--
 
Hello Christian and team,

I am from IT and helped Michael to analyze and reproduce the problem.
I would like to describe what we found and maybe you have some Ideas what we can optimize. Any advice will be helpful 🙂

The questions are:
  • why is the whole database read even for a "small" query ? (looks like a missing index or bad execution plan.. ?)
    • are there some tricks / best practices to optimize ?
  • why do we use 4K blocksize performing a seek() before every read even for sequential reads from the database file ?
    • can the blocksize be changed ? does it make sense ?
The UNIX related question (maybe it's worth opening a ticket with redhat)
  • why does the UNIX kernel enforce nfs reads (slow) for the combination: set read lock; seek; read; ... and uses the cache (is fast) if we omit seek() even if the file is locked?
Thank you for your comments in advance!
Andre


============================
Below are more details.
We performed a query (wget output about 150K) and saw in the strace the typical sequence:
  • open (..tbl.basex)
  • set read lock 
  • seek(); read() the whole file (650MB) by 4k blocks, seek() before every read (even for sequential read - where the position is already at the right place for the next read()); 
  • close().
We found that all recent OS (RHEL7,8; debian12) are reproducible bypassing the filesystem cache if the database files on a nfs mounted filesystem.
RHEL6 uses the filesystem cache.



======================= trace snippets
open(.. tbl.basex); fcntl(f, F_SETLK, {l_type=F_RDLCK}; seek(); read(f, 4k); seek(); read(.. 4k); close
  
 ..
openat(AT_FDCWD</home/drslaanstg/BaseX853_dev/basex/bin>, "/proj/mpi_la/sb_dev/sbof/tbl.basex", O_RDWR|O_CREAT, 0666) = 51

#-- set read lock on the whole file (l_len=0).
fcntl(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0

#-- lseek(), read(..,4K), lseek(), read(..,4k) - 
# reading 
..
13:09:30.050965 lseek(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, 0, SEEK_SET) = 0 <0.000063>
13:09:30.051900 read(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, "\0\0\0\0\0\0\0\0\0\0\314b\0\0\0\0001\0\1\0\0\0\0\1\0\0\314a\0\0\0\1"..., 4096) = 4096 <0.179836>
13:09:30.231893 lseek(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, 4096, SEEK_SET) = 4096 <0.000017>
13:09:30.232818 read(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, "\2\0\0\0\0\0\5\274\0\0\0\1\0\0\1\0\t\0004\0\0\0\0\5\0\0\0\2\0\0\1\1"..., 4096) = 4096 <0.000041>
13:09:30.232923 lseek(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, 8192, SEEK_SET) = 8192 <0.000015>
13:09:30.232988 read(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, "\2\0\0@\0\0\n\303\0\0\0\1\0\0\2\0\t\09\0\0\0\0\6\0\0\0\2\0\0\2\1"..., 4096) = 4096 <0.000016>
..
#jump to the end and back..
13:09:30.248343 lseek(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, 192512, SEEK_SET) = 192512 <0.000015>
13:09:30.248416 read(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, "\2\0\0@\0\0\250\n\0\0\0\1\0\0/\0\t\0A\0\0\0\0\25\0\0\0\2\0\0/\1"..., 4096) = 4096 <0.0
13:09:30.248488 lseek(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, 586047488, SEEK_SET) = 586047488 <0.000015>
13:09:30.248567 read(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, "\31\0j\0\0\0/\377\0\0\0(\0\0/\326\v\0\6\0\0\0 \217\0\0\0\0\0\0/\327"..., 4096) = 4096
13:09:30.249898 lseek(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, 196608, SEEK_SET) = 196608 <0.000016>
13:09:30.250564 read(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, "\2\0\0\0\0\0\253\254\0\0\0\1\0\0000\0\t\08\0\0\0\0\3\0\0\0\2\0\0000\1"..., 4096) = 409

close(51)

======== test program to repoduce the behavior.
#!/tool/pandora64/bin/python3-latest
# test emulating the bad performance when baseX opens a database for read
#
# both conditions must be true:
# 1. set a read lock on the file
# 2. seek before reading.
#
# without lock or without seek it is fast.
#
# rhel6 - fast (<1s)
# rhel7,8 and debian12 - slow (~10s)
#   nfs -o vers=3,nolock - has no effect
#
# agerbats (Andre Gerbatsch), 11/10/2023
#############

import fcntl

BS= 4096
FN= "/proj/sysadmin_scratch/agerbats/tbl.basex"

f = open(FN, "rb", buffering=0)

#-- set a read lock
fcntl.lockf(f, fcntl.LOCK_SH)
f.seek(0,2)
file_length=f.tell()

offset = 0
while  offset < file_length:

#-- seek to the next 4k position (it is semantically a noop)
    f.seek(offset)
    block=f.read(BS)
    offset+= BS

f.close()




--
Andre


Von: Otto, Michael (Layout Analysis) <Michael.Otto@globalfoundries.com>
Gesendet: Montag, 13. November 2023 18:19
An: Gerbatsch, Andre <Andre.Gerbatsch@globalfoundries.com>
Betreff: WG: [basex-talk] WG: JAVA Performance Issue under RH8 by using BASEX with NFS system
 


Von: Christian GrĂĽn <christian.gruen@gmail.com>
Gesendet: Montag, 13. November 2023 16:11
An: Otto, Michael (Layout Analysis) <Michael.Otto@globalfoundries.com>
Cc: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Betreff: Re: [basex-talk] WG: JAVA Performance Issue under RH8 by using BASEX with NFS system
 

🛑CAUTION: This email originated outside of the GF organization. Please consider the source / sender to ensure the content is safe before clicking on links or opening attachments. If you're unsure, please report this email as phishing so it can be analyzed.

…difficult to tell what might be the reason; we didn’t get similar reports so far. Most of the BaseX caching algorithms rely on the file system, but we don’t differ between Windows- and UNIX-based systems.

Does it make a difference if you change another JDK distribution?

Best,
Christian 


On Mon, Nov 13, 2023 at 12:32 PM Otto, Michael (Layout Analysis) <Michael.Otto@globalfoundries.com> wrote:



Von: Otto, Michael (Layout Analysis) <Michael.Otto@globalfoundries.com>
Gesendet: Sonntag, 12. November 2023 17:16
An: basex-talk-request@mailman.uni-konstanz.de <basex-talk-request@mailman.uni-konstanz.de>
Betreff: AW: JAVA Performance Issue under RH8 by using BASEX with NFS system
 
Hello.

Since switching basex (853) application from RH6 to RH8 I do see a performance issue.


internal tests:
RH6/NFS --> runtime < 1s
RH8/local storage on server --> runtime < 1s

RH8/NFS --> runtime > 15s

It seams that we see a caching issue in RH8 context (1st read of RH6 taes longer too).

I did several tests with several JAVA and BASEX(V10.7) versions (including latest one).
IT tried several mounting options for RH8 server to NFS too, but that did not solve that issue.


I that a known issue? I guess this is a caching issue of RH8 (actual only seen by using BaseX).
Is there any known solution for this?

BR

GLOBALFOUNDRIES Dresden Module One LLC & Co. KG:

 

Michael Otto

Tapeout Engn

FAB 1 TAPE OUT INTEGRATION

 

T +49 (0) 351 277-86013

F +49 (0) 351 277-86013

M/S: I21-MPI
michael.otto@globalfoundries.com

 

 


http://www.globalfoundries.com/

 

 

GLOBALFOUNDRIES Dresden Module One Limited Liability Company & Co. KG

Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland

Registered Office/ Sitz Dresden I Registergericht Dresden HRA 5255

General Partner/ Vertretungsberechtigter Komplementär: GLOBALFOUNDRIES Dresden Module One LLC (Sitz Wilmington, Delaware, USA)

General Manager/ GeschäftsfĂĽhrer der GLOBALFOUNDRIES Dresden Module One LLC: Thomas Caulfield



Von: Otto, Michael (Layout Analysis) <Michael.Otto@globalfoundries.com>
Gesendet: Mittwoch, 8. November 2023 12:01
An: basex-talk-request@mailman.uni-konstanz.de <basex-talk-request@mailman.uni-konstanz.de>
Betreff: JAVA Performance Issue under RH8 by using BASEX with NFS system
 
Hello.

Since switching basex (853) application from RH6 to RH8 I do see a performance issue.


internal tests:
RH6/NFS --> runtime < 1s
RH8/local storage on server --> runtime < 1s

RH8/NFS --> runtime > 15s

It seams that we see a caching issue in RH8 context (1st read of RH6 taes longer too).

I did several tests with several JAVA and BASEX(V10.7) versions (including latest one).
IT tried several mounting options for RH8 server to NFS too, but that did not solve that issue.


I that a known issue? I guess this is a caching issue of RH8 (actual only seen by using BaseX).
Is there any known solution for this?

BR

GLOBALFOUNDRIES Dresden Module One LLC & Co. KG:

 

Michael Otto

Tapeout Engn

FAB 1 TAPE OUT INTEGRATION

 

T +49 (0) 351 277-86013

F +49 (0) 351 277-86013

M/S: I21-MPI
michael.otto@globalfoundries.com

 

 

http://www.globalfoundries.com/

 

 

GLOBALFOUNDRIES Dresden Module One Limited Liability Company & Co. KG

Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland

Registered Office/ Sitz Dresden I Registergericht Dresden HRA 5255

General Partner/ Vertretungsberechtigter Komplementär: GLOBALFOUNDRIES Dresden Module One LLC (Sitz Wilmington, Delaware, USA)

General Manager/ Geschäftsführer der GLOBALFOUNDRIES Dresden Module One LLC: Thomas Caulfield