On Wed, May 25, 2016 at 11:09 PM, Christian Grün <christian.gruen@gmail.com> wrote:

Hi Michael,

> Now for basex.bat, in order to create a context, I started the script with
> 'declare context item := doc("input.xml");' which may not be the most
> efficient way to do this, I don't know.

If you have created a database, you can use the command-line flag -i:

basex.bat -i input query.xq

The db:open function can be used as well:

declare context item := db:open("input");

Is it only the query with the map constructor that requires more memory?

Best,
Christian

But on the command line or in the
> GUI, I haven't had any luck.
>
> Any other suggestions?
>
> Thanks,
>
> Michael
>
>
>
> On Tue, May 24, 2016 at 10:49 PM, Christian Grün <christian.gruen@gmail.com>
> wrote:
>>
>> Usually, 8GB should be much more than sufficient for such a query. You
>> could try to increase the memory, which is assigned to Java, in the
>> start scripts [1].
>>
>> Does this help?
>> Christian
>>
>> [1] http://docs.basex.org/wiki/Start_Scripts
>>
>>
>>
>> On Tue, May 24, 2016 at 11:52 PM, Michael Sanborn <galethog@gmail.com>
>> wrote:
>> > Seems like this would be perfect. I do need both number and manuf. Using
>> > your combination map, I'm now getting an "Out of Main Memory" error.
>> > Tried
>> > on a second computer - same issue. Would it be more likely to work if I
>> > tried it from the command line rather than the GUI? If so, I'll need to
>> > look
>> > up how to create a database that way, but I'm sure it's close to hand.
>> > Or is
>> > there a better workaround (besides buying a computer with more than 8GB
>> > of
>> > RAM)?
>> >
>> > Thanks again,
>> >
>> > Michael
>> >
>> > On Tue, May 24, 2016 at 2:10 PM, Christian Grün
>> > <christian.gruen@gmail.com>
>> > wrote:
>> >>
>> >> Maybe you need something like this:
>> >>
>> >> for $partinfo in //unit/partinfo
>> >> for $part in //part[deep-equal(partinfo, $partinfo)]
>> >> return replace node $partinfo with $part/node()
>> >>
>> >> The deep-equal will be pretty slow. If the value of the number element
>> >> is unique, you could do something like this:
>> >>
>> >> for $partinfo in //unit/partinfo
>> >> let $number := $partinfo/number
>> >> let $part := //part[partinfo/number, $number]
>> >> return replace node $partinfo with $part/node()
>> >>
>> >> Using a map will even be faster:
>> >>
>> >> let $map := map:merge(//part/map:entry(partinfo/number/text(), .))
>> >> for $partinfo in //unit/partinfo
>> >> let $part := $map($partinfo/number)
>> >> return replace node $partinfo with $part/node()
>> >>
>> >> If you need to consider both number and manuf, you could e.g. combine
>> >> these two in the map:
>> >>
>> >> let $map := map:merge(
>> >> for $part in //part
>> >> return map:entry(string-join($part/partinfo/*, '/'), $part)
>> >> )
>> >> for $partinfo in //unit/partinfo
>> >> let $part := $map(string-join($partinfo/*, '/'))
>> >> return replace node $partinfo with $part/node()
>> >>
>> >> Does this help?
>> >> Christian
>> >>
>> >>
>> >>
>> >>
>> >> On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn <galethog@gmail.com>
>> >> wrote:
>> >> > Thanks for that. The trouble in step 2 is, just wrapping partinfo
>> >> > with
>> >> > the
>> >> > part element doesn't get me what I've labelled "misc part content 1"
>> >> > and
>> >> > "misc part content 2". It's not sufficient to have just the tags - I
>> >> > need
>> >> > all the content of the corresponding part elements in the later part
>> >> > of
>> >> > the
>> >> > file. Is that something that can be done without too much difficulty?
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Michael
>> >> >
>> >> > On Tue, May 24, 2016 at 12:16 PM, Christian Grün
>> >> > <christian.gruen@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi Michael,
>> >> >>
>> >> >> Yes, this can easily be done with XQuery. There are many ways to do
>> >> >> this; here is just one:
>> >> >>
>> >> >> 1. First, create a database from your input file (e.g. with the
>> >> >> BaseX
>> >> >> GUI)
>> >> >>
>> >> >> 2. Second, run the following query to replace wrap your partinfo
>> >> >> elements with part elements:
>> >> >>
>> >> >> //unit/partinfo/(replace node . with <part>{ . }</part>)
>> >> >>
>> >> >> 3. Third, write all page elements to disk:
>> >> >>
>> >> >> for $page at $c in //page
>> >> >> return file:write($c || '.xml', $page)
>> >> >>
>> >> >> Hope this helps,
>> >> >> Christian
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn
>> >> >> <galethog@gmail.com>
>> >> >> wrote:
>> >> >> > I need to perform a transformation that would be simple in XSLT,
>> >> >> > but
>> >> >> > the
>> >> >> > input is a file about 250 MBs in size. I'm wondering whether
>> >> >> > XQuery
>> >> >> > and
>> >> >> > BaseX in particular would be the most efficient way of doing it.
>> >> >> > I'm
>> >> >> > new
>> >> >> > to
>> >> >> > XQuery, and I've come up with a couple of ways to do this, but
>> >> >> > they
>> >> >> > turn
>> >> >> > out
>> >> >> > to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping
>> >> >> > to
>> >> >> > find
>> >> >> > out the proper way of doing this.
>> >> >> >
>> >> >> > The input consists of 2 sections. There are about 3600 page
>> >> >> > elements
>> >> >> > with
>> >> >> > this structure:
>> >> >> >
>> >> >> > <page>
>> >> >> > [misc page content...]
>> >> >> > <list>
>> >> >> > <unit>
>> >> >> > [misc unit content 1...]
>> >> >> > <partinfo>
>> >> >> > <number>54321</number>
>> >> >> > <manuf>A321</manuf>
>> >> >> > </partinfo>
>> >> >> > <partinfo>
>> >> >> > <number>12345</number>
>> >> >> > <manuf>B123</manuf>
>> >> >> > </partinfo>
>> >> >> > [misc unit content 2...]
>> >> >> > </unit>
>> >> >> > [multiple units...]
>> >> >> > </list>
>> >> >> > </page>
>> >> >> >
>> >> >> > Each unit can have 1 or 2 partinfo elements. The other section has
>> >> >> > about
>> >> >> > 82000 part elements like this:
>> >> >> >
>> >> >> > <part>
>> >> >> > <partinfo>
>> >> >> > <number>54321</number>
>> >> >> > <manuf>A321</manuf>
>> >> >> > </partinfo>
>> >> >> > [misc part content 1]
>> >> >> > </part>
>> >> >> > [...]
>> >> >> > <part>
>> >> >> > <partinfo>
>> >> >> > <number>12345</number>
>> >> >> > <manuf>B123</manuf>
>> >> >> > </partinfo>
>> >> >> > [misc part content 2]
>> >> >> > </part>
>> >> >> >
>> >> >> > I want to replace each unit/partinfo with the correpsonding part,
>> >> >> > like
>> >> >> > this:
>> >> >> >
>> >> >> > <page>
>> >> >> > [misc page content...]
>> >> >> > <list>
>> >> >> > <unit>
>> >> >> > [misc unit content 1...]
>> >> >> > <part>
>> >> >> > <partinfo>
>> >> >> > <number>54321</number>
>> >> >> > <manuf>A321</manuf>
>> >> >> > </partinfo>
>> >> >> > [misc part content 1]
>> >> >> > </part>
>> >> >> > <part>
>> >> >> > <partinfo>
>> >> >> > <number>12345</number>
>> >> >> > <manuf>B123</manuf>
>> >> >> > </partinfo>
>> >> >> > [misc part content 2]
>> >> >> > </part>
>> >> >> > [misc unit content 2...]
>> >> >> > </unit>
>> >> >> > [multiple units...]
>> >> >> > </list>
>> >> >> > </page>
>> >> >> >
>> >> >> > Is BaseX a good tool for this task? If so, how does one go about
>> >> >> > it?
>> >> >> >
>> >> >> > Finally, it would help to be able to output each page element in a
>> >> >> > separate
>> >> >> > file. Would it be better to have BaseX do this, or to output the
>> >> >> > whole
>> >> >> > database and chunk it with another tool?
>> >> >> >
>> >> >> > Thanks,
>> >> >> >
>> >> >> > Michael
>> >> >
>> >> >
>> >
>> >
>
>