bounced by the List, re-sent after subscription --
Hello Christian and team,
I am from IT and helped Michael to analyze and reproduce the problem. I would like to describe what we found and maybe you have some Ideas what we can optimize. Any advice will be helpful 🙂
The questions are:
* why is the whole database read even for a "small" query ? (looks like a missing index or bad execution plan.. ?) * are there some tricks / best practices to optimize ? * why do we use 4K blocksize performing a seek() before every read even for sequential reads from the database file ? * can the blocksize be changed ? does it make sense ?
The UNIX related question (maybe it's worth opening a ticket with redhat)
* why does the UNIX kernel enforce nfs reads (slow) for the combination: set read lock; seek; read; ... and uses the cache (is fast) if we omit seek() even if the file is locked?
Thank you for your comments in advance! Andre
============================ Below are more details. We performed a query (wget output about 150K) and saw in the strace the typical sequence:
* open (..tbl.basex) * set read lock * seek(); read() the whole file (650MB) by 4k blocks, seek() before every read (even for sequential read - where the position is already at the right place for the next read()); * close().
We found that all recent OS (RHEL7,8; debian12) are reproducible bypassing the filesystem cache if the database files on a nfs mounted filesystem. RHEL6 uses the filesystem cache.
======================= trace snippets open(.. tbl.basex); fcntl(f, F_SETLK, {l_type=F_RDLCK}; seek(); read(f, 4k); seek(); read(.. 4k); close
.. openat(AT_FDCWD</home/drslaanstg/BaseX853_dev/basex/bin>, "/proj/mpi_la/sb_dev/sbof/tbl.basex", O_RDWR|O_CREAT, 0666) = 51
#-- set read lock on the whole file (l_len=0). fcntl(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
#-- lseek(), read(..,4K), lseek(), read(..,4k) - # reading .. 13:09:30.050965 lseek(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, 0, SEEK_SET) = 0 <0.000063> 13:09:30.051900 read(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, "\0\0\0\0\0\0\0\0\0\0\314b\0\0\0\0001\0\1\0\0\0\0\1\0\0\314a\0\0\0\1"..., 4096) = 4096 <0.179836> 13:09:30.231893 lseek(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, 4096, SEEK_SET) = 4096 <0.000017> 13:09:30.232818 read(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, "\2\0\0\0\0\0\5\274\0\0\0\1\0\0\1\0\t\0004\0\0\0\0\5\0\0\0\2\0\0\1\1"..., 4096) = 4096 <0.000041> 13:09:30.232923 lseek(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, 8192, SEEK_SET) = 8192 <0.000015> 13:09:30.232988 read(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, "\2\0\0@\0\0\n\303\0\0\0\1\0\0\2\0\t\09\0\0\0\0\6\0\0\0\2\0\0\2\1"..., 4096) = 4096 <0.000016> .. #jump to the end and back.. 13:09:30.248343 lseek(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, 192512, SEEK_SET) = 192512 <0.000015> 13:09:30.248416 read(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, "\2\0\0@\0\0\250\n\0\0\0\1\0\0/\0\t\0A\0\0\0\0\25\0\0\0\2\0\0/\1"..., 4096) = 4096 <0.0 13:09:30.248488 lseek(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, 586047488, SEEK_SET) = 586047488 <0.000015> 13:09:30.248567 read(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, "\31\0j\0\0\0/\377\0\0\0(\0\0/\326\v\0\6\0\0\0 \217\0\0\0\0\0\0/\327"..., 4096) = 4096 13:09:30.249898 lseek(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, 196608, SEEK_SET) = 196608 <0.000016> 13:09:30.250564 read(51</proj/mpi_la/sb_dev/sbof/tbl.basex>, "\2\0\0\0\0\0\253\254\0\0\0\1\0\0000\0\t\08\0\0\0\0\3\0\0\0\2\0\0000\1"..., 4096) = 409
close(51)
======== test program to repoduce the behavior. #!/tool/pandora64/bin/python3-latest # test emulating the bad performance when baseX opens a database for read # # both conditions must be true: # 1. set a read lock on the file # 2. seek before reading. # # without lock or without seek it is fast. # # rhel6 - fast (<1s) # rhel7,8 and debian12 - slow (~10s) # nfs -o vers=3,nolock - has no effect # # agerbats (Andre Gerbatsch), 11/10/2023 #############
import fcntl
BS= 4096 FN= "/proj/sysadmin_scratch/agerbats/tbl.basex"
f = open(FN, "rb", buffering=0)
#-- set a read lock fcntl.lockf(f, fcntl.LOCK_SH) f.seek(0,2) file_length=f.tell()
offset = 0 while offset < file_length:
#-- seek to the next 4k position (it is semantically a noop) f.seek(offset) block=f.read(BS) offset+= BS
f.close()
-- Andre
________________________________ Von: Otto, Michael (Layout Analysis) Michael.Otto@globalfoundries.com Gesendet: Montag, 13. November 2023 18:19 An: Gerbatsch, Andre Andre.Gerbatsch@globalfoundries.com Betreff: WG: [basex-talk] WG: JAVA Performance Issue under RH8 by using BASEX with NFS system
________________________________ Von: Christian GrĂĽn christian.gruen@gmail.com Gesendet: Montag, 13. November 2023 16:11 An: Otto, Michael (Layout Analysis) Michael.Otto@globalfoundries.com Cc: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] WG: JAVA Performance Issue under RH8 by using BASEX with NFS system
🛑CAUTION: This email originated outside of the GF organization. Please consider the source / sender to ensure the content is safe before clicking on links or opening attachments. If you're unsure, please report this email as phishing so it can be analyzed.
…difficult to tell what might be the reason; we didn’t get similar reports so far. Most of the BaseX caching algorithms rely on the file system, but we don’t differ between Windows- and UNIX-based systems.
Does it make a difference if you change another JDK distribution?
Best, Christian
On Mon, Nov 13, 2023 at 12:32 PM Otto, Michael (Layout Analysis) <Michael.Otto@globalfoundries.commailto:Michael.Otto@globalfoundries.com> wrote:
________________________________ Von: Otto, Michael (Layout Analysis) <Michael.Otto@globalfoundries.commailto:Michael.Otto@globalfoundries.com> Gesendet: Sonntag, 12. November 2023 17:16 An: basex-talk-request@mailman.uni-konstanz.demailto:basex-talk-request@mailman.uni-konstanz.de <basex-talk-request@mailman.uni-konstanz.demailto:basex-talk-request@mailman.uni-konstanz.de> Betreff: AW: JAVA Performance Issue under RH8 by using BASEX with NFS system
Hello.
Since switching basex (853) application from RH6 to RH8 I do see a performance issue.
internal tests: RH6/NFS --> runtime < 1s RH8/local storage on server --> runtime < 1s
RH8/NFS --> runtime > 15s
It seams that we see a caching issue in RH8 context (1st read of RH6 taes longer too).
I did several tests with several JAVA and BASEX(V10.7) versions (including latest one). IT tried several mounting options for RH8 server to NFS too, but that did not solve that issue.
I that a known issue? I guess this is a caching issue of RH8 (actual only seen by using BaseX). Is there any known solution for this?
BR
GLOBALFOUNDRIES Dresden Module One LLC & Co. KG:
Michael Otto
Tapeout Engn
FAB 1 TAPE OUT INTEGRATION
T +49 (0) 351 277-86013
F +49 (0) 351 277-86013
M/S: I21-MPI michael.otto@globalfoundries.commailto:michael.otto@globalfoundries.com
http://www.globalfoundries.com/
GLOBALFOUNDRIES Dresden Module One Limited Liability Company & Co. KG
Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registered Office/ Sitz Dresden I Registergericht Dresden HRA 5255
General Partner/ Vertretungsberechtigter Komplementär: GLOBALFOUNDRIES Dresden Module One LLC (Sitz Wilmington, Delaware, USA)
General Manager/ Geschäftsführer der GLOBALFOUNDRIES Dresden Module One LLC: Thomas Caulfield
________________________________ Von: Otto, Michael (Layout Analysis) <Michael.Otto@globalfoundries.commailto:Michael.Otto@globalfoundries.com> Gesendet: Mittwoch, 8. November 2023 12:01 An: basex-talk-request@mailman.uni-konstanz.demailto:basex-talk-request@mailman.uni-konstanz.de <basex-talk-request@mailman.uni-konstanz.demailto:basex-talk-request@mailman.uni-konstanz.de> Betreff: JAVA Performance Issue under RH8 by using BASEX with NFS system
Hello.
Since switching basex (853) application from RH6 to RH8 I do see a performance issue.
internal tests: RH6/NFS --> runtime < 1s RH8/local storage on server --> runtime < 1s
RH8/NFS --> runtime > 15s
It seams that we see a caching issue in RH8 context (1st read of RH6 taes longer too).
I did several tests with several JAVA and BASEX(V10.7) versions (including latest one). IT tried several mounting options for RH8 server to NFS too, but that did not solve that issue.
I that a known issue? I guess this is a caching issue of RH8 (actual only seen by using BaseX). Is there any known solution for this?
BR
GLOBALFOUNDRIES Dresden Module One LLC & Co. KG:
Michael Otto
Tapeout Engn
FAB 1 TAPE OUT INTEGRATION
T +49 (0) 351 277-86013
F +49 (0) 351 277-86013
M/S: I21-MPI michael.otto@globalfoundries.commailto:michael.otto@globalfoundries.com
http://www.globalfoundries.com/
GLOBALFOUNDRIES Dresden Module One Limited Liability Company & Co. KG
Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registered Office/ Sitz Dresden I Registergericht Dresden HRA 5255
General Partner/ Vertretungsberechtigter Komplementär: GLOBALFOUNDRIES Dresden Module One LLC (Sitz Wilmington, Delaware, USA)
General Manager/ Geschäftsführer der GLOBALFOUNDRIES Dresden Module One LLC: Thomas Caulfield