Re: [basex-talk] Replacing node sets in a large file

26 May 2016

      Hi Michael,
...
Now for basex.bat, in order to create a context, I started the script with
'declare context item := doc("input.xml");' which may not be the most
efficient way to do this, I don't know.
If you have created a database, you can use the command-line flag -i:
basex.bat -i input query.xq
The db:open function can be used as well:
declare context item := db:open("input");
Is it only the query with the map constructor that requires more memory?
Best,
Christian
But on the command line or in the
...
GUI, I haven't had any luck.
Any other suggestions?
Thanks,
Michael
On Tue, May 24, 2016 at 10:49 PM, Christian Grün christian.gruen@gmail.com
wrote:
...
Usually, 8GB should be much more than sufficient for such a query. You
could try to increase the memory, which is assigned to Java, in the
start scripts [1].
Does this help?
Christian
[1] http://docs.basex.org/wiki/Start_Scripts
On Tue, May 24, 2016 at 11:52 PM, Michael Sanborn galethog@gmail.com
wrote:
...
Seems like this would be perfect. I do need both number and manuf. Using
your combination map, I'm now getting an "Out of Main Memory" error.
Tried
on a second computer - same issue. Would it be more likely to work if I
tried it from the command line rather than the GUI? If so, I'll need to
look
up how to create a database that way, but I'm sure it's close to hand.
Or is
there a better workaround (besides buying a computer with more than 8GB
of
RAM)?
Thanks again,
Michael
On Tue, May 24, 2016 at 2:10 PM, Christian Grün
christian.gruen@gmail.com
wrote:
...
Maybe you need something like this:
for $partinfo in //unit/partinfo
  for $part in //part[deep-equal(partinfo, $partinfo)]
  return replace node $partinfo with $part/node()
The deep-equal will be pretty slow. If the value of the number element
is unique, you could do something like this:
for $partinfo in //unit/partinfo
  let $number := $partinfo/number
  let $part := //part[partinfo/number, $number]
  return replace node $partinfo with $part/node()
Using a map will even be faster:
let $map := map:merge(//part/map:entry(partinfo/number/text(), .))
  for $partinfo in //unit/partinfo
  let $part := $map($partinfo/number)
  return replace node $partinfo with $part/node()
If you need to consider both number and manuf, you could e.g. combine
these two in the map:
let $map := map:merge(
    for $part in //part
    return map:entry(string-join($part/partinfo/*, '/'), $part)
  )
  for $partinfo in //unit/partinfo
  let $part := $map(string-join($partinfo/*, '/'))
  return replace node $partinfo with $part/node()
Does this help?
Christian
On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn galethog@gmail.com
wrote:
...
Thanks for that. The trouble in step 2 is, just wrapping partinfo
with
the
part element doesn't get me what I've labelled "misc part content 1"
and
"misc part content 2". It's not sufficient to have just the tags - I
need
all the content of the corresponding part elements in the later part
of
the
file. Is that something that can be done without too much difficulty?
Thanks,
Michael
On Tue, May 24, 2016 at 12:16 PM, Christian Grün
christian.gruen@gmail.com
wrote:
...
Hi Michael,
Yes, this can easily be done with XQuery. There are many ways to do
this; here is just one:

First, create a database from your input file (e.g. with the

BaseX
GUI)

Second, run the following query to replace wrap your partinfo

elements with part elements:
//unit/partinfo/(replace node . with <part>{ . }</part>)

Third, write all page elements to disk:

for $page at $c in //page
  return file:write($c || '.xml', $page)
Hope this helps,
Christian
On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn
galethog@gmail.com
wrote:
> I need to perform a transformation that would be simple in XSLT,
> but
> the
> input is a file about 250 MBs in size. I'm wondering whether
> XQuery
> and
> BaseX in particular would be the most efficient way of doing it.
> I'm
> new
> to
> XQuery, and I've come up with a couple of ways to do this, but
> they
> turn
> out
> to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping
> to
> find
> out the proper way of doing this.
>
> The input consists of 2 sections. There are about 3600 page
> elements
> with
> this structure:
>
> <page>
>     [misc page content...]
>     <list>
>         <unit>
>             [misc unit content 1...]
>             <partinfo>
>                 <number>54321</number>
>                 <manuf>A321</manuf>
>             </partinfo>
>             <partinfo>
>                 <number>12345</number>
>                 <manuf>B123</manuf>
>             </partinfo>
>             [misc unit content 2...]
>         </unit>
>         [multiple units...]
>     </list>
> </page>
>
> Each unit can have 1 or 2 partinfo elements. The other section has
> about
> 82000 part elements like this:
>
> <part>
> <partinfo>
> <number>54321</number>
> <manuf>A321</manuf>
> </partinfo>
> [misc part content 1]
> </part>
> [...]
> <part>
> <partinfo>
> <number>12345</number>
> <manuf>B123</manuf>
> </partinfo>
> [misc part content 2]
> </part>
>
> I want to replace each unit/partinfo with the correpsonding part,
> like
> this:
>
> <page>
>     [misc page content...]
>     <list>
>         <unit>
>             [misc unit content 1...]
>             <part>
>                 <partinfo>
>                     <number>54321</number>
>                     <manuf>A321</manuf>
>                 </partinfo>
>                 [misc part content 1]
>             </part>
>             <part>
>                 <partinfo>
>                     <number>12345</number>
>                     <manuf>B123</manuf>
>                 </partinfo>
>                 [misc part content 2]
>             </part>
>             [misc unit content 2...]
>         </unit>
>         [multiple units...]
>     </list>
> </page>
>
> Is BaseX a good tool for this task? If so, how does one go about
> it?
>
> Finally, it would help to be able to output each page element in a
> separate
> file. Would it be better to have BaseX do this, or to output the
> whole
> database and chunk it with another tool?
>
> Thanks,
>
> Michael

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Replacing node sets in a large file