Also, we have a pretty slick system when it comes to identifying small issues. We are looking at preventing larger issues (that occur from domino effects), or at least we want to be able to study them when they occur. Here is an example of a domino effect issue we had:
Ex. 1. Add an attribute to the root of an element. 2. Run a few read operations with success. 3. Run a process that updates one of the files that has a new root attribute, get a corrupted DB.
Anything that can leave a trace to identify that file out of the few 100 I am processing is a win. Since our files go through different processes based on their state, I can eliminate a lot of files just by knowing which functions were called for which files. This helps me figure out the link between operation #1 and operation #3. In our last case, operation #3 was run a few weeks after operation #1. A trace would not have helped much, but in most cases, it could.