Hi Lizzi,
Thanks for the information!
And thanks back for the details.
When using OPTIMIZE it is not clear what caused the out of memory error. With individual CREATE INDEX statements I ran into the out of memory error on the FULLTEXT index.
I see; in both cases, it must be the fulltext index. As you have discovered in the documentation, we are writing partial index structures to disk once main memory gets exhausted. This works very well for the text and attribude index (you can usually index 10 GB of data with 100 MB of main memory), but it gets more and more clear that the corresponding merge algorithms need to be improved and better adapted to the full-text index.
Did you try selective indexing (i.e., limit the indexed fulltext nodes to the ones that will eventually be queried)?
I have not tried incremental indexing with UPDINDEX or AUTOINDEX. My understanding from the documentation is that UPDINDEX does not update the full text index, and incremental should be turned off to improve the speed of bulk imports.
Completely true; in your case, it doesn’t really help.
Today I benchmarked ADD vs REPLACE and have not seen much difference in speed.
Once a REPLACE is called, additional meta data structures will be created that need to be maintained, so it could be that you will need to start from scratch with a new database.
Today I found the section on Index Performance in the documentation (http://docs.basex.org/wiki/Indexes#Performance). This section mentions “If main memory runs out while creating a value index, the current index structures will be partially written to disk and eventually merged.” Does this mean that if running OPTIMIZE ALL ends with an out of memory error, running OPTIMIZE as many times as needed will eventually update all of the indexes?
Once you get out of memory, indexing will be interrupted and needs to be done again.
Hope this helps, Christian