Hi Michael,
Yes, this can easily be done with XQuery. There are many ways to do
this; here is just one:
1. First, create a database from your input file (e.g. with the BaseX GUI)
2. Second, run the following query to replace wrap your partinfo
elements with part elements:
//unit/partinfo/(replace node . with <part>{ . }</part>)
3. Third, write all page elements to disk:
for $page at $c in //page
return file:write($c || '.xml', $page)
Hope this helps,
Christian
On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn <galethog@gmail.com> wrote:
> I need to perform a transformation that would be simple in XSLT, but the
> input is a file about 250 MBs in size. I'm wondering whether XQuery and
> BaseX in particular would be the most efficient way of doing it. I'm new to
> XQuery, and I've come up with a couple of ways to do this, but they turn out
> to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to find
> out the proper way of doing this.
>
> The input consists of 2 sections. There are about 3600 page elements with
> this structure:
>
> <page>
> [misc page content...]
> <list>
> <unit>
> [misc unit content 1...]
> <partinfo>
> <number>54321</number>
> <manuf>A321</manuf>
> </partinfo>
> <partinfo>
> <number>12345</number>
> <manuf>B123</manuf>
> </partinfo>
> [misc unit content 2...]
> </unit>
> [multiple units...]
> </list>
> </page>
>
> Each unit can have 1 or 2 partinfo elements. The other section has about
> 82000 part elements like this:
>
> <part>
> <partinfo>
> <number>54321</number>
> <manuf>A321</manuf>
> </partinfo>
> [misc part content 1]
> </part>
> [...]
> <part>
> <partinfo>
> <number>12345</number>
> <manuf>B123</manuf>
> </partinfo>
> [misc part content 2]
> </part>
>
> I want to replace each unit/partinfo with the correpsonding part, like this:
>
> <page>
> [misc page content...]
> <list>
> <unit>
> [misc unit content 1...]
> <part>
> <partinfo>
> <number>54321</number>
> <manuf>A321</manuf>
> </partinfo>
> [misc part content 1]
> </part>
> <part>
> <partinfo>
> <number>12345</number>
> <manuf>B123</manuf>
> </partinfo>
> [misc part content 2]
> </part>
> [misc unit content 2...]
> </unit>
> [multiple units...]
> </list>
> </page>
>
> Is BaseX a good tool for this task? If so, how does one go about it?
>
> Finally, it would help to be able to output each page element in a separate
> file. Would it be better to have BaseX do this, or to output the whole
> database and chunk it with another tool?
>
> Thanks,
>
> Michael