Seems like this would be perfect. I do need both number and manuf. Using your combination map, I'm now getting an "Out of Main Memory" error. Tried on a second computer - same issue. Would it be more likely to work if I tried it from the command line rather than the GUI? If so, I'll need to look up how to create a database that way, but I'm sure it's close to hand. Or is there a better workaround (besides buying a computer with more than 8GB of RAM)?
Thanks again,
Michael
On Tue, May 24, 2016 at 2:10 PM, Christian Grün christian.gruen@gmail.com wrote:
Maybe you need something like this:
for $partinfo in //unit/partinfo for $part in //part[deep-equal(partinfo, $partinfo)] return replace node $partinfo with $part/node()
The deep-equal will be pretty slow. If the value of the number element is unique, you could do something like this:
for $partinfo in //unit/partinfo let $number := $partinfo/number let $part := //part[partinfo/number, $number] return replace node $partinfo with $part/node()
Using a map will even be faster:
let $map := map:merge(//part/map:entry(partinfo/number/text(), .)) for $partinfo in //unit/partinfo let $part := $map($partinfo/number) return replace node $partinfo with $part/node()
If you need to consider both number and manuf, you could e.g. combine these two in the map:
let $map := map:merge( for $part in //part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $partinfo in //unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with $part/node()
Does this help? Christian
On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn galethog@gmail.com wrote:
Thanks for that. The trouble in step 2 is, just wrapping partinfo with
the
part element doesn't get me what I've labelled "misc part content 1" and "misc part content 2". It's not sufficient to have just the tags - I need all the content of the corresponding part elements in the later part of
the
file. Is that something that can be done without too much difficulty?
Thanks,
Michael
On Tue, May 24, 2016 at 12:16 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
Hi Michael,
Yes, this can easily be done with XQuery. There are many ways to do this; here is just one:
- First, create a database from your input file (e.g. with the BaseX
GUI)
- Second, run the following query to replace wrap your partinfo
elements with part elements:
//unit/partinfo/(replace node . with <part>{ . }</part>)
- Third, write all page elements to disk:
for $page at $c in //page return file:write($c || '.xml', $page)
Hope this helps, Christian
On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn galethog@gmail.com wrote:
I need to perform a transformation that would be simple in XSLT, but
the
input is a file about 250 MBs in size. I'm wondering whether XQuery
and
BaseX in particular would be the most efficient way of doing it. I'm
new
to XQuery, and I've come up with a couple of ways to do this, but they
turn
out to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to find out the proper way of doing this.
The input consists of 2 sections. There are about 3600 page elements with this structure:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Each unit can have 1 or 2 partinfo elements. The other section has
about
82000 part elements like this:
<part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> [...] <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part>
I want to replace each unit/partinfo with the correpsonding part, like this:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Is BaseX a good tool for this task? If so, how does one go about it?
Finally, it would help to be able to output each page element in a separate file. Would it be better to have BaseX do this, or to output the whole database and chunk it with another tool?
Thanks,
Michael