I need to perform a transformation that would be simple in XSLT, but the input is a file about 250 MBs in size. I'm wondering whether XQuery and BaseX in particular would be the most efficient way of doing it. I'm new to XQuery, and I've come up with a couple of ways to do this, but they turn out to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to find out the proper way of doing this.
The input consists of 2 sections. There are about 3600 page elements with this structure:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Each unit can have 1 or 2 partinfo elements. The other section has about 82000 part elements like this:
<part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> [...] <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part>
I want to replace each unit/partinfo with the correpsonding part, like this:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Is BaseX a good tool for this task? If so, how does one go about it?
Finally, it would help to be able to output each page element in a separate file. Would it be better to have BaseX do this, or to output the whole database and chunk it with another tool?
Thanks,
Michael