Hello all, for the first time I encountered a scenario which runs according to an "insertion intensive" pattern. A lot of documents are stored into a database at a rather fast pace. With the computing resources I currently have, the process slows down rather quickly since indexes (created with the "create index" command) got invalidated immediately. I moved to the UPDINDEX option and this alleviated the slowing down but made my "small" system run quickly into disk space issues since the data files grow at a much higher rate (5.8 GB after one day of continuous work). I can fix this by running the optimize all command. I was not very comfortable with this command so I stopped everything and made a cold run. The result was surprisingly good. The amount of disk space got reduced to 1% of the size it had before the optimize and it also took a small amount of time compared to the dimension of the input data. My first question is whether running optimize command regularly (maybe every hour or so) without stopping the flow of incoming documents is a safe way to go or whether it will impede somehow the workload or introduce any risks on data integrity. Secondly, to improve the workload on a single database (thus reducing the frequency of optimize calls), I was planning to split my data moving less important data to a different database (but same server). Will this be an effective solution in this scenario?
Sorry for the possibly naive questions but I'm really not a very expert in this quantitative aspects. Thanks and regards, Marco.
Hi Marco,
My first question is whether running optimize command regularly (maybe every hour or so) without stopping the flow of incoming documents is a safe way to go or whether it will impede somehow the workload or introduce any risks on data integrity.
If the OPTIMIZE command by the same server instance, it's absolutely safe to use it. It will delay incoming requests, but none of them will be lost (see [1] for more details).
Secondly, to improve the workload on a single database (thus reducing the frequency of optimize calls), I was planning to split my data moving less important data to a different database (but same server). Will this be an effective solution in this scenario?
Yes, that's a reasonable approach when working with large data. If you have data that's static (i.e., not changed anymore), you can store it in a database that's perfectly optimized and indexed, and you can organize daily updates in a second, smaller database.
Hope this helps, Christian
Thanks Christian,
just one more question about your statement:
If the OPTIMIZE command by the same server instance, it's absolutely safe to use it. It will delay incoming requests, but none of them will be lost (see [1] for more details).
This means that I have to use "basexclient" (client-server) script rather than "basex" (standalone) when going to run the optimize commands through a cron job right? M.
On 07/25/2014 09:52 AM, Christian Grün wrote:
Hi Marco,
My first question is whether running optimize command regularly (maybe every hour or so) without stopping the flow of incoming documents is a safe way to go or whether it will impede somehow the workload or introduce any risks on data integrity.
If the OPTIMIZE command by the same server instance, it's absolutely safe to use it. It will delay incoming requests, but none of them will be lost (see [1] for more details).
Secondly, to improve the workload on a single database (thus reducing the frequency of optimize calls), I was planning to split my data moving less important data to a different database (but same server). Will this be an effective solution in this scenario?
Yes, that's a reasonable approach when working with large data. If you have data that's static (i.e., not changed anymore), you can store it in a database that's perfectly optimized and indexed, and you can organize daily updates in a second, smaller database.
Hope this helps, Christian
This means that I have to use "basexclient" (client-server) script rather than "basex" (standalone) when going to run the optimize commands through a cron job right?
...exactly (or REST calls etc. if you are working in the http context).
On 07/25/2014 11:06 AM, Christian Grün wrote:
This means that I have to use "basexclient" (client-server) script rather than "basex" (standalone) when going to run the optimize commands through a cron job right?
...exactly (or REST calls etc. if you are working in the http context).
Perfectly clear! Thank you again! M.
basex-talk@mailman.uni-konstanz.de