Hi James,
The issue I'm seeing is that the size of the index grows by approximately 1MB with every updating 'transaction' (snapshot?) even if there is no new data for the index. For example if I have a database with 100,000 files and I replace one of those files (with itself so there's no new data) then the size of the index will go up by around 1MB. If I replace 1000 files in the same transaction (again with themselves) the size of the index will go up again by around 1MB. Dropping and recreating the index returns it to its original size. I have a current project where I'm expecting thousands of files a few at time that need to be added/replaced - I completely ran out of disk space before I spotted what was happening when testing.
I can confirm that this is a known issue of the UPDINDEX option. We didn't have time so far to dive into this yet (and it doesn't seem to cause troubles in all scenarios we know). I assume the reason is that obsolete ID lists in atvl.basex will not be overwritten by newer data, but instead are orphaned. Instead, newly created ID lists will always be appended to the end of this file, resulting in a continuous increase of the file size.
One way out (until this has been fixed) is to optimize these databases in regular time intervals.
I don't know the format for the index files but I've looked at atvl.basex just in a text editor. It looks like for each update to the index around 40k blank lines are being added. I don't know that they are truly blank lines - but that's how they're rendering in the editor.
This sounds surprising, but it could be an interesting hint. If you manage to compress this file to a reasonable size, feel free to send it to me.
Best, Christian