best practices questions - BaseX-Talk - mailman.uni-konstanz.de

5 Nov 2012


      I have a db composed of several sub folders, each containing xml files that
represent a person's full record from an outside source. Treating the
aggregated folders as a single system has allowed us to full text all of
our data at once, which has been a great improvement over the first
generation of our system.
As I look to move beyond testing, I now need to implement the ability to
update the existing records and include information that extends beyond the
outside source's. Is it better industry practice with xml based systems to
a) append all the new information to the original xml, giving me one file
with a full record for that person or b) create an additional sub folder or
separate db with reviews and extra data, referencing the ID of their file
from the outside source, isolating things like reviews as separate entities
in the system, and keeping the original record intact?
I am leaning towards method a, as it will result in less queries and
relational style effort to assemble complete records. However I am
wondering if option b would be better, as when I pull updates from our
outside sources I could simply drop/replace the original record if the time
stamp has changed, without losing our internally generated data on that
person.
Ultimately, I have been trying to figure out a way to compare and update
easily, as each outside source provides a different format, and part of the
appeal of transitioning to xml was to avoid heavy customization each time a
new source was added or an existing service changed it's output. I don't
want to write update algorithms for each service tailored for particular
tags if there is a way to have it match on it's own.
Thank you for your thoughts,
Mike