I have a db composed of several sub folders, each containing xml files that represent a person's full record from an outside source. Treating the aggregated folders as a single system has allowed us to full text all of our data at once, which has been a great improvement over the first generation of our system.

As I look to move beyond testing, I now need to implement the ability to update the existing records and include information that extends beyond the outside source's. Is it better industry practice with xml based systems to a) append all the new information to the original xml, giving me one file with a full record for that person or b) create an additional sub folder or separate db with reviews and extra data, referencing the ID of their file from the outside source, isolating things like reviews as separate entities in the system, and keeping the original record intact?

I am leaning towards method a, as it will result in less queries and relational style effort to assemble complete records. However I am wondering if option b would be better, as when I pull updates from our outside sources I could simply drop/replace the original record if the time stamp has changed, without losing our internally generated data on that person.

Ultimately, I have been trying to figure out a way to compare and update easily, as each outside source provides a different format, and part of the appeal of transitioning to xml was to avoid heavy customization each time a new source was added or an existing service changed it's output. I don't want to write update algorithms for each service tailored for particular tags if there is a way to have it match on it's own.

Thank you for your thoughts,
Mike