I have a db composed of several sub folders, each containing xml files that represent a person's full record from an outside source. Treating the aggregated folders as a single system has allowed us to full text all of our data at once, which has been a great improvement over the first generation of our system.
As I look to move beyond testing, I now need to implement the ability to update the existing records and include information that extends beyond the outside source's. Is it better industry practice with xml based systems to a) append all the new information to the original xml, giving me one file with a full record for that person or b) create an additional sub folder or separate db with reviews and extra data, referencing the ID of their file from the outside source, isolating things like reviews as separate entities in the system, and keeping the original record intact?
I am leaning towards method a, as it will result in less queries and relational style effort to assemble complete records. However I am wondering if option b would be better, as when I pull updates from our outside sources I could simply drop/replace the original record if the time stamp has changed, without losing our internally generated data on that person.
Ultimately, I have been trying to figure out a way to compare and update easily, as each outside source provides a different format, and part of the appeal of transitioning to xml was to avoid heavy customization each time a new source was added or an existing service changed it's output. I don't want to write update algorithms for each service tailored for particular tags if there is a way to have it match on it's own.
Thank you for your thoughts, Mike
Dear MIke,
it's always difficult to give general advise on how to distribute data in XML databases or, particularly, BaseX, but… it's often helpful to look at how often your data, and parts of it will be updated. If some of your data is rather static, but needs to be queried often, and some other data will change frequently, it’s very sensible to use at least two databases: the first one can be indexed once, or every time some data will actually change, and the other one can be kept small, such that it can either be reindexed very quickly or accessed in a reasonable time without index structures.
In our productive use cases, we often work with tens or hundreds of database instances, which are then requested by single XQuery calls. It may take some initial time to get versatile enough when coding XQuery, but it's absolutely worth the time if you want to create a little bit more complex applications with BaseX.
In a nutshell: I would go for option b), even if this looks like more work to start with.
Hope this helps, Christian ___________________________
I have a db composed of several sub folders, each containing xml files that represent a person's full record from an outside source. Treating the aggregated folders as a single system has allowed us to full text all of our data at once, which has been a great improvement over the first generation of our system.
As I look to move beyond testing, I now need to implement the ability to update the existing records and include information that extends beyond the outside source's. Is it better industry practice with xml based systems to a) append all the new information to the original xml, giving me one file with a full record for that person or b) create an additional sub folder or separate db with reviews and extra data, referencing the ID of their file from the outside source, isolating things like reviews as separate entities in the system, and keeping the original record intact?
I am leaning towards method a, as it will result in less queries and relational style effort to assemble complete records. However I am wondering if option b would be better, as when I pull updates from our outside sources I could simply drop/replace the original record if the time stamp has changed, without losing our internally generated data on that person.
Ultimately, I have been trying to figure out a way to compare and update easily, as each outside source provides a different format, and part of the appeal of transitioning to xml was to avoid heavy customization each time a new source was added or an existing service changed it's output. I don't want to write update algorithms for each service tailored for particular tags if there is a way to have it match on it's own.
Thank you for your thoughts, Mike
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de