Maybe you need something like this:
for $partinfo in //unit/partinfo
for $part in //part[deep-equal(partinfo, $partinfo)]
return replace node $partinfo with $part/node()
The deep-equal will be pretty slow. If the value of the number element
is unique, you could do something like this:
for $partinfo in //unit/partinfo
let $number := $partinfo/number
let $part := //part[partinfo/number, $number]
return replace node $partinfo with $part/node()
Using a map will even be faster:
let $map := map:merge(//part/map:entry(partinfo/number/text(), .))
for $partinfo in //unit/partinfo
let $part := $map($partinfo/number)
return replace node $partinfo with $part/node()
If you need to consider both number and manuf, you could e.g. combine
these two in the map:
let $map := map:merge(
for $part in //part
return map:entry(string-join($part/partinfo/*, '/'), $part)
)
for $partinfo in //unit/partinfo
let $part := $map(string-join($partinfo/*, '/'))
return replace node $partinfo with $part/node()
Does this help?
Christian
On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn <galethog@gmail.com> wrote:
> Thanks for that. The trouble in step 2 is, just wrapping partinfo with the
> part element doesn't get me what I've labelled "misc part content 1" and
> "misc part content 2". It's not sufficient to have just the tags - I need
> all the content of the corresponding part elements in the later part of the
> file. Is that something that can be done without too much difficulty?
>
> Thanks,
>
> Michael
>
> On Tue, May 24, 2016 at 12:16 PM, Christian Grün <christian.gruen@gmail.com>
> wrote:
>>
>> Hi Michael,
>>
>> Yes, this can easily be done with XQuery. There are many ways to do
>> this; here is just one:
>>
>> 1. First, create a database from your input file (e.g. with the BaseX GUI)
>>
>> 2. Second, run the following query to replace wrap your partinfo
>> elements with part elements:
>>
>> //unit/partinfo/(replace node . with <part>{ . }</part>)
>>
>> 3. Third, write all page elements to disk:
>>
>> for $page at $c in //page
>> return file:write($c || '.xml', $page)
>>
>> Hope this helps,
>> Christian
>>
>>
>>
>> On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn <galethog@gmail.com>
>> wrote:
>> > I need to perform a transformation that would be simple in XSLT, but the
>> > input is a file about 250 MBs in size. I'm wondering whether XQuery and
>> > BaseX in particular would be the most efficient way of doing it. I'm new
>> > to
>> > XQuery, and I've come up with a couple of ways to do this, but they turn
>> > out
>> > to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to
>> > find
>> > out the proper way of doing this.
>> >
>> > The input consists of 2 sections. There are about 3600 page elements
>> > with
>> > this structure:
>> >
>> > <page>
>> > [misc page content...]
>> > <list>
>> > <unit>
>> > [misc unit content 1...]
>> > <partinfo>
>> > <number>54321</number>
>> > <manuf>A321</manuf>
>> > </partinfo>
>> > <partinfo>
>> > <number>12345</number>
>> > <manuf>B123</manuf>
>> > </partinfo>
>> > [misc unit content 2...]
>> > </unit>
>> > [multiple units...]
>> > </list>
>> > </page>
>> >
>> > Each unit can have 1 or 2 partinfo elements. The other section has about
>> > 82000 part elements like this:
>> >
>> > <part>
>> > <partinfo>
>> > <number>54321</number>
>> > <manuf>A321</manuf>
>> > </partinfo>
>> > [misc part content 1]
>> > </part>
>> > [...]
>> > <part>
>> > <partinfo>
>> > <number>12345</number>
>> > <manuf>B123</manuf>
>> > </partinfo>
>> > [misc part content 2]
>> > </part>
>> >
>> > I want to replace each unit/partinfo with the correpsonding part, like
>> > this:
>> >
>> > <page>
>> > [misc page content...]
>> > <list>
>> > <unit>
>> > [misc unit content 1...]
>> > <part>
>> > <partinfo>
>> > <number>54321</number>
>> > <manuf>A321</manuf>
>> > </partinfo>
>> > [misc part content 1]
>> > </part>
>> > <part>
>> > <partinfo>
>> > <number>12345</number>
>> > <manuf>B123</manuf>
>> > </partinfo>
>> > [misc part content 2]
>> > </part>
>> > [misc unit content 2...]
>> > </unit>
>> > [multiple units...]
>> > </list>
>> > </page>
>> >
>> > Is BaseX a good tool for this task? If so, how does one go about it?
>> >
>> > Finally, it would help to be able to output each page element in a
>> > separate
>> > file. Would it be better to have BaseX do this, or to output the whole
>> > database and chunk it with another tool?
>> >
>> > Thanks,
>> >
>> > Michael
>
>