Hi Freesoft,
I have uninstalled 7.6 and installed 7.7 beta. Then, created the empty db, added the 3 files, run the "set addcache true" command, added the 17828 files... and no "out of memory" error, just the processing info:
Good to hear. Please note that it’s always faster to specify initial documents along with the CREATE DB command instead of adding them in a second step (but I’m aware that you’re mainly interested in the time required to incrementally add new documents).
- Is 7.7 beta sufficiently stable to be used in our production server?
Shoud I wait for the final 7.7 release?
The current snapshot should be a safe bet, as there will be no critical updates until the official release.
- Is the "addcache" property value permanently saved to the db? Should I
run the "set addcache true" command everytime I add files?
The value of ADDCACHE is bound to the current BaseX instance and won't be stored in the database. This means that you’ll have to set it to true whenever you run a new BaseX instance.
But.. As you stumbled upon an issue that has also been discussed before, I had yet another look at the ADD command, and I added some heuristics for directory inputs. If the documents to be added are expected to blow up main memory, they will be cached even if ADDCACHE is set to false. You are invited to check out the latest version [1] and give us some more feedback.
- Should I keep disabled the Text & Attribute indexes? Is the "addcache=on"
option sufficient to allow the adition of XML files, so I can enable those indexes? Will my queries be slow with those indexes disabled?
If text and attribute indexes are enabled, they will be invalidated with an update and restored with the next OPTIMIZE call, so it’s a good choice to keep the defaults. Not all queries will get slower without indexes. You can have a look at the query info (shown e.g. in the GUI’s InfoView) to see if the query plans with and without index structures differ.
- Should I run Optimize after every massive insertion (even with
"addcache=on")?
It’s generally advisable to run OPTIMIZE whenever you want to perform queries on your new data.
mean a medium value of exactly 1 KB/file. Since my files are bigger than 1 KB (in medium), then the size limit will be reached first (512 GiB).
My assumption is that you will first hit the node id limit (#Nodes), but simply try and see what happens.
Please show me an easy example of how to use several databases in the same query. Perhaps something like:
for $doc in (collection("db1"), collection("db2")) for $node in $doc/$a_node_path
Looks fine. This is one more alternative:
for $i in 1 to 100 let $db := "db" || $i return db:open($db)/your/path
Well, thank you very much for your help. And excuse me for the huge amount of questions from a newbie like me :-)
Your questions are welcome. If you got some free time, you are invited to read out documentation; many of its contents have been inspiried by earlier discussions on this list.
Christian
[1] http://files.basex.org/releases/latest/ [2] http://docs.basex.org/wiki/Main_Page