Hi,
I'm currently evaluating BaseX for a project. I've read all the online documentation but the underlying storage and indexing mechanisms are still a bit of a mystery to me, so I'm having trouble making optimal decisions in designing a large collection of documents.
I have 10 million documents of moderate size. These are intended to be regularly replaced/updated.
I have the choice of storing each document individually in a collection, or inserting/updating into a single document. Which approach will generally perform better?
In an experiment, I found that after adding a few million documents, adding new documents got really slow. The JVM pegs at 100% CPU so it is doing a lot of work. What's going on here? Indexing? Would increasing the JVM memory help? Can indexing be disabled for bulk loads?
Rather than try random things to see what worked, I was hoping to get some insight into how the system stores, indexes and uses resources.
Many thanks,
Michael