Using 11.0 beta e461f98 on centos.

 

What I’m seeing on both Linux and macOS for databases that appear to never finish optimizing, is that the various index files reflect the current time stamp, meaning they have been written to, where the upd.basex file’s time stamp reflects the time the optimization was started:

[eliot.kimber.adm@uswdlsolr03 _temp_lrk_washingtondc_link_records]$ ls -al

total 245416

drwxr-xr-x.  2 eliot.kimber.adm domain users      4096 Mar  3 15:24 .

drwxr-xr-x. 34 eliot.kimber.adm domain users     12288 Mar  3 17:17 ..

-rw-r--r--.  1 eliot.kimber.adm domain users  59763275 Mar  3 17:27 atv.basex

-rw-r--r--.  1 eliot.kimber.adm domain users   5988029 Mar  3 17:27 atvl.basex

-rw-r--r--.  1 eliot.kimber.adm domain users   2257240 Mar  3 17:27 atvr.basex

-rw-r--r--.  1 eliot.kimber.adm domain users       236 Mar  3 15:24 ftxx.basex

-rw-r--r--.  1 eliot.kimber.adm domain users    205425 Mar  3 15:24 ftxy.basex

-rw-r--r--.  1 eliot.kimber.adm domain users   1339905 Mar  3 15:24 ftxz.basex

-rw-r--r--.  1 eliot.kimber.adm domain users        15 Mar  3 15:24 idp.basex

-rw-r--r--.  1 eliot.kimber.adm domain users     14128 Mar  3 15:24 inf.basex

-rw-r--r--.  1 eliot.kimber.adm domain users        67 Mar  3 15:24 pth.basex

-rw-r--r--.  1 eliot.kimber.adm domain users        28 Mar  3 15:24 swl.basex

-rw-r--r--.  1 eliot.kimber.adm domain users  67334144 Mar  3 17:27 tbl.basex

-rw-r--r--.  1 eliot.kimber.adm domain users         9 Mar  3 15:24 tbli.basex

-rw-r--r--.  1 eliot.kimber.adm domain users 109991241 Mar  3 17:27 tokl.basex

-rw-r--r--.  1 eliot.kimber.adm domain users   2361275 Mar  3 17:27 tokr.basex

-rw-r--r--.  1 eliot.kimber.adm domain users   1442904 Mar  3 17:22 txt.basex

-rw-r--r--.  1 eliot.kimber.adm domain users    295757 Mar  3 15:24 txtl.basex

-rw-r--r--.  1 eliot.kimber.adm domain users    243050 Mar  3 15:05 txtr.basex

-rw-r--r--.  1 eliot.kimber.adm domain users         0 Mar  3 15:24 upd.basex

 

Here, 17:27 is the time I ran the ls command and 15:24 is the time the optimization request was submitted:

15:24:31.724

JOB:orch:job12_1709468383163

admin

REQUEST

0.00

NEW INITIALIZE WORKTREE washingtondc: dbadmin:optimizeDatabase('_temp_lrk_washingtondc_link_records')


Where dbadmin:optimizeDatabase() is:

declare updating function dbadmin:optimizeDatabase(

  $database as xs:string

) {

   try {

     if (db:exists($database))

     then

     (

       util:logToConsole('dbadmin:optimizeDatabase', ``[Optimizing database `{$database}`...]``),

       db:optimize($database, true(), $dbadmin:dbOptimizeOptions),

       util:logToConsole('dbadmin:optimizeDatabase', ``[Database `{$database}` optimized.]``)

     )

     else util:logToConsole('dbadmin:optimizeDatabase', ``[Database '`{$database}`' does not exist. Nothing to optimize.]``)

   } catch * {

     util:logToConsole(

       'dbadmin:optimizeDatabase',

       ``[Exception optimizing database '`{$database}`': `{$err:code}` - `{$err:description}`]``,

       'error')

   }

 

};

And $dbadmin:dbOptimizeOptions is:

declare variable $dbadmin:dbOptimizeOptions as map(*) :=

    (: Turn on all the indexes :)

  map {

    'attrindex' : true(),

    'tokenindex' : true(),

    'textindex' : true(),

    'ftindex' : true()

  };

 

Cheers,

 

E.

 

_____________________________________________

Eliot Kimber

Sr Staff Content Engineer

O: 512 554 9368

M: 512 554 9368

servicenow.com

LinkedIn | Twitter | YouTube | Facebook

 

From: Eliot Kimber <eliot.kimber@servicenow.com>
Date: Sunday, March 3, 2024 at 9:00
AM
To: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Subject: Possible lingering issues with database optimization in v11

I’m continuing to test my process for loading data which depends on optimizing databases, some of which are pretty large (100+MB with 100s of 1000s of elements). I’m testing on both macOS and linux using the 27 Feb build on macOS and the 29 build on linux (just a matter of when I downloaded them).

 

What I’m seeing is that when I test with a relatively small content set the optimization completes reliably and everything works as it should.

 

When I test with a realistically large data set, the optimization either takes a very long time (as much as an hour to complete) or never completes with the server at 100% CPU utilization. It seems to be worse on macOS but it’s difficult to verify, partly because a test takes several hours to run.

 

I have the BaseX source code available locally, although I’m unable to compile with maven due to internal maven issues (we have a pretty locked down maven proxy and I don’t know maven well enough to know how I might configure around that).

Is there anything I can do to diagnose this issue to at least confirm or deny that there are still deadlock issues with the optimization?

I assume that it should not take 10s of minutes to optimize even a large database.

 

Here's the details for a typical database that has failed to optimize on macOS:

 

Thanks,

 

Eliot

_____________________________________________

Eliot Kimber

Sr Staff Content Engineer

O: 512 554 9368

M: 512 554 9368

servicenow.com

LinkedIn | Twitter | YouTube | Facebook