I need to perform a transformation that would be simple in XSLT, but the input is a file about 250 MBs in size. I'm wondering whether XQuery and BaseX in particular would be the most efficient way of doing it. I'm new to XQuery, and I've come up with a couple of ways to do this, but they turn out to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to find out the proper way of doing this.
The input consists of 2 sections. There are about 3600 page elements with this structure:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Each unit can have 1 or 2 partinfo elements. The other section has about 82000 part elements like this:
<part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> [...] <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part>
I want to replace each unit/partinfo with the correpsonding part, like this:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Is BaseX a good tool for this task? If so, how does one go about it?
Finally, it would help to be able to output each page element in a separate file. Would it be better to have BaseX do this, or to output the whole database and chunk it with another tool?
Thanks,
Michael
Hi Michael,
Yes, this can easily be done with XQuery. There are many ways to do this; here is just one:
1. First, create a database from your input file (e.g. with the BaseX GUI)
2. Second, run the following query to replace wrap your partinfo elements with part elements:
//unit/partinfo/(replace node . with <part>{ . }</part>)
3. Third, write all page elements to disk:
for $page at $c in //page return file:write($c || '.xml', $page)
Hope this helps, Christian
On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn galethog@gmail.com wrote:
I need to perform a transformation that would be simple in XSLT, but the input is a file about 250 MBs in size. I'm wondering whether XQuery and BaseX in particular would be the most efficient way of doing it. I'm new to XQuery, and I've come up with a couple of ways to do this, but they turn out to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to find out the proper way of doing this.
The input consists of 2 sections. There are about 3600 page elements with this structure:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Each unit can have 1 or 2 partinfo elements. The other section has about 82000 part elements like this:
<part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> [...] <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part>
I want to replace each unit/partinfo with the correpsonding part, like this:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Is BaseX a good tool for this task? If so, how does one go about it?
Finally, it would help to be able to output each page element in a separate file. Would it be better to have BaseX do this, or to output the whole database and chunk it with another tool?
Thanks,
Michael
Thanks for that. The trouble in step 2 is, just wrapping partinfo with the part element doesn't get me what I've labelled "misc part content 1" and "misc part content 2". It's not sufficient to have just the tags - I need all the content of the corresponding part elements in the later part of the file. Is that something that can be done without too much difficulty?
Thanks,
Michael
On Tue, May 24, 2016 at 12:16 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Michael,
Yes, this can easily be done with XQuery. There are many ways to do this; here is just one:
First, create a database from your input file (e.g. with the BaseX GUI)
Second, run the following query to replace wrap your partinfo
elements with part elements:
//unit/partinfo/(replace node . with <part>{ . }</part>)
- Third, write all page elements to disk:
for $page at $c in //page return file:write($c || '.xml', $page)
Hope this helps, Christian
On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn galethog@gmail.com wrote:
I need to perform a transformation that would be simple in XSLT, but the input is a file about 250 MBs in size. I'm wondering whether XQuery and BaseX in particular would be the most efficient way of doing it. I'm new
to
XQuery, and I've come up with a couple of ways to do this, but they turn
out
to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to find out the proper way of doing this.
The input consists of 2 sections. There are about 3600 page elements with this structure:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Each unit can have 1 or 2 partinfo elements. The other section has about 82000 part elements like this:
<part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> [...] <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part>
I want to replace each unit/partinfo with the correpsonding part, like
this:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Is BaseX a good tool for this task? If so, how does one go about it?
Finally, it would help to be able to output each page element in a
separate
file. Would it be better to have BaseX do this, or to output the whole database and chunk it with another tool?
Thanks,
Michael
Maybe you need something like this:
for $partinfo in //unit/partinfo for $part in //part[deep-equal(partinfo, $partinfo)] return replace node $partinfo with $part/node()
The deep-equal will be pretty slow. If the value of the number element is unique, you could do something like this:
for $partinfo in //unit/partinfo let $number := $partinfo/number let $part := //part[partinfo/number, $number] return replace node $partinfo with $part/node()
Using a map will even be faster:
let $map := map:merge(//part/map:entry(partinfo/number/text(), .)) for $partinfo in //unit/partinfo let $part := $map($partinfo/number) return replace node $partinfo with $part/node()
If you need to consider both number and manuf, you could e.g. combine these two in the map:
let $map := map:merge( for $part in //part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $partinfo in //unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with $part/node()
Does this help? Christian
On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn galethog@gmail.com wrote:
Thanks for that. The trouble in step 2 is, just wrapping partinfo with the part element doesn't get me what I've labelled "misc part content 1" and "misc part content 2". It's not sufficient to have just the tags - I need all the content of the corresponding part elements in the later part of the file. Is that something that can be done without too much difficulty?
Thanks,
Michael
On Tue, May 24, 2016 at 12:16 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Michael,
Yes, this can easily be done with XQuery. There are many ways to do this; here is just one:
First, create a database from your input file (e.g. with the BaseX GUI)
Second, run the following query to replace wrap your partinfo
elements with part elements:
//unit/partinfo/(replace node . with <part>{ . }</part>)
- Third, write all page elements to disk:
for $page at $c in //page return file:write($c || '.xml', $page)
Hope this helps, Christian
On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn galethog@gmail.com wrote:
I need to perform a transformation that would be simple in XSLT, but the input is a file about 250 MBs in size. I'm wondering whether XQuery and BaseX in particular would be the most efficient way of doing it. I'm new to XQuery, and I've come up with a couple of ways to do this, but they turn out to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to find out the proper way of doing this.
The input consists of 2 sections. There are about 3600 page elements with this structure:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Each unit can have 1 or 2 partinfo elements. The other section has about 82000 part elements like this:
<part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> [...] <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part>
I want to replace each unit/partinfo with the correpsonding part, like this:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Is BaseX a good tool for this task? If so, how does one go about it?
Finally, it would help to be able to output each page element in a separate file. Would it be better to have BaseX do this, or to output the whole database and chunk it with another tool?
Thanks,
Michael
Seems like this would be perfect. I do need both number and manuf. Using your combination map, I'm now getting an "Out of Main Memory" error. Tried on a second computer - same issue. Would it be more likely to work if I tried it from the command line rather than the GUI? If so, I'll need to look up how to create a database that way, but I'm sure it's close to hand. Or is there a better workaround (besides buying a computer with more than 8GB of RAM)?
Thanks again,
Michael
On Tue, May 24, 2016 at 2:10 PM, Christian Grün christian.gruen@gmail.com wrote:
Maybe you need something like this:
for $partinfo in //unit/partinfo for $part in //part[deep-equal(partinfo, $partinfo)] return replace node $partinfo with $part/node()
The deep-equal will be pretty slow. If the value of the number element is unique, you could do something like this:
for $partinfo in //unit/partinfo let $number := $partinfo/number let $part := //part[partinfo/number, $number] return replace node $partinfo with $part/node()
Using a map will even be faster:
let $map := map:merge(//part/map:entry(partinfo/number/text(), .)) for $partinfo in //unit/partinfo let $part := $map($partinfo/number) return replace node $partinfo with $part/node()
If you need to consider both number and manuf, you could e.g. combine these two in the map:
let $map := map:merge( for $part in //part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $partinfo in //unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with $part/node()
Does this help? Christian
On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn galethog@gmail.com wrote:
Thanks for that. The trouble in step 2 is, just wrapping partinfo with
the
part element doesn't get me what I've labelled "misc part content 1" and "misc part content 2". It's not sufficient to have just the tags - I need all the content of the corresponding part elements in the later part of
the
file. Is that something that can be done without too much difficulty?
Thanks,
Michael
On Tue, May 24, 2016 at 12:16 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
Hi Michael,
Yes, this can easily be done with XQuery. There are many ways to do this; here is just one:
- First, create a database from your input file (e.g. with the BaseX
GUI)
- Second, run the following query to replace wrap your partinfo
elements with part elements:
//unit/partinfo/(replace node . with <part>{ . }</part>)
- Third, write all page elements to disk:
for $page at $c in //page return file:write($c || '.xml', $page)
Hope this helps, Christian
On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn galethog@gmail.com wrote:
I need to perform a transformation that would be simple in XSLT, but
the
input is a file about 250 MBs in size. I'm wondering whether XQuery
and
BaseX in particular would be the most efficient way of doing it. I'm
new
to XQuery, and I've come up with a couple of ways to do this, but they
turn
out to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to find out the proper way of doing this.
The input consists of 2 sections. There are about 3600 page elements with this structure:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Each unit can have 1 or 2 partinfo elements. The other section has
about
82000 part elements like this:
<part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> [...] <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part>
I want to replace each unit/partinfo with the correpsonding part, like this:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Is BaseX a good tool for this task? If so, how does one go about it?
Finally, it would help to be able to output each page element in a separate file. Would it be better to have BaseX do this, or to output the whole database and chunk it with another tool?
Thanks,
Michael
Usually, 8GB should be much more than sufficient for such a query. You could try to increase the memory, which is assigned to Java, in the start scripts [1].
Does this help? Christian
[1] http://docs.basex.org/wiki/Start_Scripts
On Tue, May 24, 2016 at 11:52 PM, Michael Sanborn galethog@gmail.com wrote:
Seems like this would be perfect. I do need both number and manuf. Using your combination map, I'm now getting an "Out of Main Memory" error. Tried on a second computer - same issue. Would it be more likely to work if I tried it from the command line rather than the GUI? If so, I'll need to look up how to create a database that way, but I'm sure it's close to hand. Or is there a better workaround (besides buying a computer with more than 8GB of RAM)?
Thanks again,
Michael
On Tue, May 24, 2016 at 2:10 PM, Christian Grün christian.gruen@gmail.com wrote:
Maybe you need something like this:
for $partinfo in //unit/partinfo for $part in //part[deep-equal(partinfo, $partinfo)] return replace node $partinfo with $part/node()
The deep-equal will be pretty slow. If the value of the number element is unique, you could do something like this:
for $partinfo in //unit/partinfo let $number := $partinfo/number let $part := //part[partinfo/number, $number] return replace node $partinfo with $part/node()
Using a map will even be faster:
let $map := map:merge(//part/map:entry(partinfo/number/text(), .)) for $partinfo in //unit/partinfo let $part := $map($partinfo/number) return replace node $partinfo with $part/node()
If you need to consider both number and manuf, you could e.g. combine these two in the map:
let $map := map:merge( for $part in //part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $partinfo in //unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with $part/node()
Does this help? Christian
On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn galethog@gmail.com wrote:
Thanks for that. The trouble in step 2 is, just wrapping partinfo with the part element doesn't get me what I've labelled "misc part content 1" and "misc part content 2". It's not sufficient to have just the tags - I need all the content of the corresponding part elements in the later part of the file. Is that something that can be done without too much difficulty?
Thanks,
Michael
On Tue, May 24, 2016 at 12:16 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Michael,
Yes, this can easily be done with XQuery. There are many ways to do this; here is just one:
- First, create a database from your input file (e.g. with the BaseX
GUI)
- Second, run the following query to replace wrap your partinfo
elements with part elements:
//unit/partinfo/(replace node . with <part>{ . }</part>)
- Third, write all page elements to disk:
for $page at $c in //page return file:write($c || '.xml', $page)
Hope this helps, Christian
On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn galethog@gmail.com wrote:
I need to perform a transformation that would be simple in XSLT, but the input is a file about 250 MBs in size. I'm wondering whether XQuery and BaseX in particular would be the most efficient way of doing it. I'm new to XQuery, and I've come up with a couple of ways to do this, but they turn out to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to find out the proper way of doing this.
The input consists of 2 sections. There are about 3600 page elements with this structure:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Each unit can have 1 or 2 partinfo elements. The other section has about 82000 part elements like this:
<part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> [...] <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part>
I want to replace each unit/partinfo with the correpsonding part, like this:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Is BaseX a good tool for this task? If so, how does one go about it?
Finally, it would help to be able to output each page element in a separate file. Would it be better to have BaseX do this, or to output the whole database and chunk it with another tool?
Thanks,
Michael
Sorry to say I still haven't been able to get it to work. Whether I edit basex.bat or basexgui.bat, changing Xmx512m to Xmx1024m, and launching them on two different computers, I get "Out of Main Memory" within a minute. I also tried Xmx2048m, but that gives me a "Could not reserve enough space" error.
Now for basex.bat, in order to create a context, I started the script with 'declare context item := doc("input.xml");' which may not be the most efficient way to do this, I don't know. But on the command line or in the GUI, I haven't had any luck.
Any other suggestions?
Thanks,
Michael
On Tue, May 24, 2016 at 10:49 PM, Christian Grün christian.gruen@gmail.com wrote:
Usually, 8GB should be much more than sufficient for such a query. You could try to increase the memory, which is assigned to Java, in the start scripts [1].
Does this help? Christian
[1] http://docs.basex.org/wiki/Start_Scripts
On Tue, May 24, 2016 at 11:52 PM, Michael Sanborn galethog@gmail.com wrote:
Seems like this would be perfect. I do need both number and manuf. Using your combination map, I'm now getting an "Out of Main Memory" error.
Tried
on a second computer - same issue. Would it be more likely to work if I tried it from the command line rather than the GUI? If so, I'll need to
look
up how to create a database that way, but I'm sure it's close to hand.
Or is
there a better workaround (besides buying a computer with more than 8GB
of
RAM)?
Thanks again,
Michael
On Tue, May 24, 2016 at 2:10 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
Maybe you need something like this:
for $partinfo in //unit/partinfo for $part in //part[deep-equal(partinfo, $partinfo)] return replace node $partinfo with $part/node()
The deep-equal will be pretty slow. If the value of the number element is unique, you could do something like this:
for $partinfo in //unit/partinfo let $number := $partinfo/number let $part := //part[partinfo/number, $number] return replace node $partinfo with $part/node()
Using a map will even be faster:
let $map := map:merge(//part/map:entry(partinfo/number/text(), .)) for $partinfo in //unit/partinfo let $part := $map($partinfo/number) return replace node $partinfo with $part/node()
If you need to consider both number and manuf, you could e.g. combine these two in the map:
let $map := map:merge( for $part in //part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $partinfo in //unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with $part/node()
Does this help? Christian
On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn galethog@gmail.com wrote:
Thanks for that. The trouble in step 2 is, just wrapping partinfo with the part element doesn't get me what I've labelled "misc part content 1"
and
"misc part content 2". It's not sufficient to have just the tags - I need all the content of the corresponding part elements in the later part
of
the file. Is that something that can be done without too much difficulty?
Thanks,
Michael
On Tue, May 24, 2016 at 12:16 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Michael,
Yes, this can easily be done with XQuery. There are many ways to do this; here is just one:
- First, create a database from your input file (e.g. with the BaseX
GUI)
- Second, run the following query to replace wrap your partinfo
elements with part elements:
//unit/partinfo/(replace node . with <part>{ . }</part>)
- Third, write all page elements to disk:
for $page at $c in //page return file:write($c || '.xml', $page)
Hope this helps, Christian
On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn <galethog@gmail.com
wrote:
I need to perform a transformation that would be simple in XSLT,
but
the input is a file about 250 MBs in size. I'm wondering whether XQuery and BaseX in particular would be the most efficient way of doing it.
I'm
new to XQuery, and I've come up with a couple of ways to do this, but they turn out to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping
to
find out the proper way of doing this.
The input consists of 2 sections. There are about 3600 page
elements
with this structure:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Each unit can have 1 or 2 partinfo elements. The other section has about 82000 part elements like this:
<part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> [...] <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part>
I want to replace each unit/partinfo with the correpsonding part, like this:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Is BaseX a good tool for this task? If so, how does one go about
it?
Finally, it would help to be able to output each page element in a separate file. Would it be better to have BaseX do this, or to output the whole database and chunk it with another tool?
Thanks,
Michael
Hi Michael,
Which Java are you using? If you are using 32 bit Java and have set a high memory value in Xmx Java might fail to start. Check that you are using a 64 bit version of Java.
If you need to have more than one version of Java on your system, you can edit basex.bat or basexgui.bat to include the full path to the java that you want BaseX to use.
Hope this helps.
Vincent
From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Michael Sanborn Sent: Wednesday, May 25, 2016 7:07 PM To: Christian Grün christian.gruen@gmail.com Cc: BaseX basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Replacing node sets in a large file
Sorry to say I still haven't been able to get it to work. Whether I edit basex.bat or basexgui.bat, changing Xmx512m to Xmx1024m, and launching them on two different computers, I get "Out of Main Memory" within a minute. I also tried Xmx2048m, but that gives me a "Could not reserve enough space" error.
Now for basex.bat, in order to create a context, I started the script with 'declare context item := doc("input.xml");' which may not be the most efficient way to do this, I don't know. But on the command line or in the GUI, I haven't had any luck.
Any other suggestions?
Thanks,
Michael
On Tue, May 24, 2016 at 10:49 PM, Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> wrote: Usually, 8GB should be much more than sufficient for such a query. You could try to increase the memory, which is assigned to Java, in the start scripts [1].
Does this help? Christian
[1] http://docs.basex.org/wiki/Start_Scriptshttp://docs.basex.org/wiki/Start_Scripts
On Tue, May 24, 2016 at 11:52 PM, Michael Sanborn <galethog@gmail.commailto:galethog@gmail.com> wrote:
Seems like this would be perfect. I do need both number and manuf. Using your combination map, I'm now getting an "Out of Main Memory" error. Tried on a second computer - same issue. Would it be more likely to work if I tried it from the command line rather than the GUI? If so, I'll need to look up how to create a database that way, but I'm sure it's close to hand. Or is there a better workaround (besides buying a computer with more than 8GB of RAM)?
Thanks again,
Michael
On Tue, May 24, 2016 at 2:10 PM, Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> wrote:
Maybe you need something like this:
for $partinfo in //unit/partinfo for $part in //part[deep-equal(partinfo, $partinfo)] return replace node $partinfo with $part/node()
The deep-equal will be pretty slow. If the value of the number element is unique, you could do something like this:
for $partinfo in //unit/partinfo let $number := $partinfo/number let $part := //part[partinfo/number, $number] return replace node $partinfo with $part/node()
Using a map will even be faster:
let $map := map:merge(//part/map:entry(partinfo/number/text(), .)) for $partinfo in //unit/partinfo let $part := $map($partinfo/number) return replace node $partinfo with $part/node()
If you need to consider both number and manuf, you could e.g. combine these two in the map:
let $map := map:merge( for $part in //part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $partinfo in //unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with $part/node()
Does this help? Christian
On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn <galethog@gmail.commailto:galethog@gmail.com> wrote:
Thanks for that. The trouble in step 2 is, just wrapping partinfo with the part element doesn't get me what I've labelled "misc part content 1" and "misc part content 2". It's not sufficient to have just the tags - I need all the content of the corresponding part elements in the later part of the file. Is that something that can be done without too much difficulty?
Thanks,
Michael
On Tue, May 24, 2016 at 12:16 PM, Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> wrote:
Hi Michael,
Yes, this can easily be done with XQuery. There are many ways to do this; here is just one:
- First, create a database from your input file (e.g. with the BaseX
GUI)
- Second, run the following query to replace wrap your partinfo
elements with part elements:
//unit/partinfo/(replace node . with <part>{ . }</part>)
- Third, write all page elements to disk:
for $page at $c in //page return file:write($c || '.xml', $page)
Hope this helps, Christian
On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn <galethog@gmail.commailto:galethog@gmail.com> wrote:
I need to perform a transformation that would be simple in XSLT, but the input is a file about 250 MBs in size. I'm wondering whether XQuery and BaseX in particular would be the most efficient way of doing it. I'm new to XQuery, and I've come up with a couple of ways to do this, but they turn out to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping to find out the proper way of doing this.
The input consists of 2 sections. There are about 3600 page elements with this structure:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Each unit can have 1 or 2 partinfo elements. The other section has about 82000 part elements like this:
<part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> [...] <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part>
I want to replace each unit/partinfo with the correpsonding part, like this:
<page> [misc page content...] <list> <unit> [misc unit content 1...] <part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> [misc part content 1] </part> <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> [misc part content 2] </part> [misc unit content 2...] </unit> [multiple units...] </list> </page>
Is BaseX a good tool for this task? If so, how does one go about it?
Finally, it would help to be able to output each page element in a separate file. Would it be better to have BaseX do this, or to output the whole database and chunk it with another tool?
Thanks,
Michael
Hi Michael,
Now for basex.bat, in order to create a context, I started the script with 'declare context item := doc("input.xml");' which may not be the most efficient way to do this, I don't know.
If you have created a database, you can use the command-line flag -i:
basex.bat -i input query.xq
The db:open function can be used as well:
declare context item := db:open("input");
Is it only the query with the map constructor that requires more memory?
Best, Christian
But on the command line or in the
GUI, I haven't had any luck.
Any other suggestions?
Thanks,
Michael
On Tue, May 24, 2016 at 10:49 PM, Christian Grün christian.gruen@gmail.com wrote:
Usually, 8GB should be much more than sufficient for such a query. You could try to increase the memory, which is assigned to Java, in the start scripts [1].
Does this help? Christian
[1] http://docs.basex.org/wiki/Start_Scripts
On Tue, May 24, 2016 at 11:52 PM, Michael Sanborn galethog@gmail.com wrote:
Seems like this would be perfect. I do need both number and manuf. Using your combination map, I'm now getting an "Out of Main Memory" error. Tried on a second computer - same issue. Would it be more likely to work if I tried it from the command line rather than the GUI? If so, I'll need to look up how to create a database that way, but I'm sure it's close to hand. Or is there a better workaround (besides buying a computer with more than 8GB of RAM)?
Thanks again,
Michael
On Tue, May 24, 2016 at 2:10 PM, Christian Grün christian.gruen@gmail.com wrote:
Maybe you need something like this:
for $partinfo in //unit/partinfo for $part in //part[deep-equal(partinfo, $partinfo)] return replace node $partinfo with $part/node()
The deep-equal will be pretty slow. If the value of the number element is unique, you could do something like this:
for $partinfo in //unit/partinfo let $number := $partinfo/number let $part := //part[partinfo/number, $number] return replace node $partinfo with $part/node()
Using a map will even be faster:
let $map := map:merge(//part/map:entry(partinfo/number/text(), .)) for $partinfo in //unit/partinfo let $part := $map($partinfo/number) return replace node $partinfo with $part/node()
If you need to consider both number and manuf, you could e.g. combine these two in the map:
let $map := map:merge( for $part in //part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $partinfo in //unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with $part/node()
Does this help? Christian
On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn galethog@gmail.com wrote:
Thanks for that. The trouble in step 2 is, just wrapping partinfo with the part element doesn't get me what I've labelled "misc part content 1" and "misc part content 2". It's not sufficient to have just the tags - I need all the content of the corresponding part elements in the later part of the file. Is that something that can be done without too much difficulty?
Thanks,
Michael
On Tue, May 24, 2016 at 12:16 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Michael,
Yes, this can easily be done with XQuery. There are many ways to do this; here is just one:
- First, create a database from your input file (e.g. with the
BaseX GUI)
- Second, run the following query to replace wrap your partinfo
elements with part elements:
//unit/partinfo/(replace node . with <part>{ . }</part>)
- Third, write all page elements to disk:
for $page at $c in //page return file:write($c || '.xml', $page)
Hope this helps, Christian
On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn galethog@gmail.com wrote: > I need to perform a transformation that would be simple in XSLT, > but > the > input is a file about 250 MBs in size. I'm wondering whether > XQuery > and > BaseX in particular would be the most efficient way of doing it. > I'm > new > to > XQuery, and I've come up with a couple of ways to do this, but > they > turn > out > to be very time-consuming, so I'm sure I'm Doing It Wrong. Hoping > to > find > out the proper way of doing this. > > The input consists of 2 sections. There are about 3600 page > elements > with > this structure: > > <page> > [misc page content...] > <list> > <unit> > [misc unit content 1...] > <partinfo> > <number>54321</number> > <manuf>A321</manuf> > </partinfo> > <partinfo> > <number>12345</number> > <manuf>B123</manuf> > </partinfo> > [misc unit content 2...] > </unit> > [multiple units...] > </list> > </page> > > Each unit can have 1 or 2 partinfo elements. The other section has > about > 82000 part elements like this: > > <part> > <partinfo> > <number>54321</number> > <manuf>A321</manuf> > </partinfo> > [misc part content 1] > </part> > [...] > <part> > <partinfo> > <number>12345</number> > <manuf>B123</manuf> > </partinfo> > [misc part content 2] > </part> > > I want to replace each unit/partinfo with the correpsonding part, > like > this: > > <page> > [misc page content...] > <list> > <unit> > [misc unit content 1...] > <part> > <partinfo> > <number>54321</number> > <manuf>A321</manuf> > </partinfo> > [misc part content 1] > </part> > <part> > <partinfo> > <number>12345</number> > <manuf>B123</manuf> > </partinfo> > [misc part content 2] > </part> > [misc unit content 2...] > </unit> > [multiple units...] > </list> > </page> > > Is BaseX a good tool for this task? If so, how does one go about > it? > > Finally, it would help to be able to output each page element in a > separate > file. Would it be better to have BaseX do this, or to output the > whole > database and chunk it with another tool? > > Thanks, > > Michael
(Re: 32-bit vs 64-bit) Of the two machines I've been trying this on, one is my laptop, and it turned out that it was using 32-bit Java. So I upgraded to the 64-bit version, and I find that I can actually get it to work, more or less, with Xmx4096m. So thanks for that. The other machine has had 64-bit Java all along, and I still get an out of memory error even with Xmx4096m, but it's a VM so there might be resource-sharing issues - I'll come back to that later.
So, about running this on the command line, here's the current state of the code:
let $map := map:merge( for $part in //part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $partinfo in //unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with <part>{$part/node()}</part> for $page in //page return file:write('data/'|| $page/@key || '.xml', $page, map{"omit-xml-declaration":"no"})
1) There'll be a new version of the input file arriving periodically, so I'd prefer to be able to do everything at the command line without having to create the database inside the GUI. Not sure of the best way to go about that.
2) In the GUI for now, I can perform the replace with the map, or I can run the for loop that writes out all the files. But I get an "Unexpected end of query" error when I try to do both, so what's the problem with my syntax?
Thanks so much,
Michael
On Wed, May 25, 2016 at 11:09 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Michael,
Now for basex.bat, in order to create a context, I started the script
with
'declare context item := doc("input.xml");' which may not be the most efficient way to do this, I don't know.
If you have created a database, you can use the command-line flag -i:
basex.bat -i input query.xq
The db:open function can be used as well:
declare context item := db:open("input");
Is it only the query with the map constructor that requires more memory?
Best, Christian
But on the command line or in the
GUI, I haven't had any luck.
Any other suggestions?
Thanks,
Michael
On Tue, May 24, 2016 at 10:49 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
Usually, 8GB should be much more than sufficient for such a query. You could try to increase the memory, which is assigned to Java, in the start scripts [1].
Does this help? Christian
[1] http://docs.basex.org/wiki/Start_Scripts
On Tue, May 24, 2016 at 11:52 PM, Michael Sanborn galethog@gmail.com wrote:
Seems like this would be perfect. I do need both number and manuf.
Using
your combination map, I'm now getting an "Out of Main Memory" error. Tried on a second computer - same issue. Would it be more likely to work if
I
tried it from the command line rather than the GUI? If so, I'll need
to
look up how to create a database that way, but I'm sure it's close to hand. Or is there a better workaround (besides buying a computer with more than
8GB
of RAM)?
Thanks again,
Michael
On Tue, May 24, 2016 at 2:10 PM, Christian Grün christian.gruen@gmail.com wrote:
Maybe you need something like this:
for $partinfo in //unit/partinfo for $part in //part[deep-equal(partinfo, $partinfo)] return replace node $partinfo with $part/node()
The deep-equal will be pretty slow. If the value of the number
element
is unique, you could do something like this:
for $partinfo in //unit/partinfo let $number := $partinfo/number let $part := //part[partinfo/number, $number] return replace node $partinfo with $part/node()
Using a map will even be faster:
let $map := map:merge(//part/map:entry(partinfo/number/text(), .)) for $partinfo in //unit/partinfo let $part := $map($partinfo/number) return replace node $partinfo with $part/node()
If you need to consider both number and manuf, you could e.g. combine these two in the map:
let $map := map:merge( for $part in //part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $partinfo in //unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with $part/node()
Does this help? Christian
On Tue, May 24, 2016 at 10:54 PM, Michael Sanborn <
galethog@gmail.com>
wrote:
Thanks for that. The trouble in step 2 is, just wrapping partinfo with the part element doesn't get me what I've labelled "misc part content
1"
and "misc part content 2". It's not sufficient to have just the tags -
I
need all the content of the corresponding part elements in the later
part
of the file. Is that something that can be done without too much
difficulty?
Thanks,
Michael
On Tue, May 24, 2016 at 12:16 PM, Christian Grün christian.gruen@gmail.com wrote: > > Hi Michael, > > Yes, this can easily be done with XQuery. There are many ways to
do
> this; here is just one: > > 1. First, create a database from your input file (e.g. with the > BaseX > GUI) > > 2. Second, run the following query to replace wrap your partinfo > elements with part elements: > > //unit/partinfo/(replace node . with <part>{ . }</part>) > > 3. Third, write all page elements to disk: > > for $page at $c in //page > return file:write($c || '.xml', $page) > > Hope this helps, > Christian > > > > On Tue, May 24, 2016 at 8:54 PM, Michael Sanborn > galethog@gmail.com > wrote: > > I need to perform a transformation that would be simple in XSLT, > > but > > the > > input is a file about 250 MBs in size. I'm wondering whether > > XQuery > > and > > BaseX in particular would be the most efficient way of doing it. > > I'm > > new > > to > > XQuery, and I've come up with a couple of ways to do this, but > > they > > turn > > out > > to be very time-consuming, so I'm sure I'm Doing It Wrong.
Hoping
> > to > > find > > out the proper way of doing this. > > > > The input consists of 2 sections. There are about 3600 page > > elements > > with > > this structure: > > > > <page> > > [misc page content...] > > <list> > > <unit> > > [misc unit content 1...] > > <partinfo> > > <number>54321</number> > > <manuf>A321</manuf> > > </partinfo> > > <partinfo> > > <number>12345</number> > > <manuf>B123</manuf> > > </partinfo> > > [misc unit content 2...] > > </unit> > > [multiple units...] > > </list> > > </page> > > > > Each unit can have 1 or 2 partinfo elements. The other section
has
> > about > > 82000 part elements like this: > > > > <part> > > <partinfo> > > <number>54321</number> > > <manuf>A321</manuf> > > </partinfo> > > [misc part content 1] > > </part> > > [...] > > <part> > > <partinfo> > > <number>12345</number> > > <manuf>B123</manuf> > > </partinfo> > > [misc part content 2] > > </part> > > > > I want to replace each unit/partinfo with the correpsonding
part,
> > like > > this: > > > > <page> > > [misc page content...] > > <list> > > <unit> > > [misc unit content 1...] > > <part> > > <partinfo> > > <number>54321</number> > > <manuf>A321</manuf> > > </partinfo> > > [misc part content 1] > > </part> > > <part> > > <partinfo> > > <number>12345</number> > > <manuf>B123</manuf> > > </partinfo> > > [misc part content 2] > > </part> > > [misc unit content 2...] > > </unit> > > [multiple units...] > > </list> > > </page> > > > > Is BaseX a good tool for this task? If so, how does one go about > > it? > > > > Finally, it would help to be able to output each page element
in a
> > separate > > file. Would it be better to have BaseX do this, or to output the > > whole > > database and chunk it with another tool? > > > > Thanks, > > > > Michael
Hi Michael,
- There'll be a new version of the input file arriving periodically, so I'd
prefer to be able to do everything at the command line without having to create the database inside the GUI. Not sure of the best way to go about that.
Using the GUI was just an example. You can create new databases via commands (CREATE), XQuery (db:create) or our APIs. Please check out our documentation for more hints.
- In the GUI for now, I can perform the replace with the map, or I can run
the for loop that writes out all the files. But I get an "Unexpected end of query" error when I try to do both, so what's the problem with my syntax?
In XQuery, multiple expressions can be separated with commas.
Note, however, that XQuery is a functional language; as such, it is not possible to first update items and then access them in the same query that easily. There are various alternatives to get around this limitation:
1. Use a BaseX command script to run all operations [1]:
<commands> <create-db name='input'>...path/to/input.xml</create-db> <xquery><![CDATA[ let $map := ... ]]></xquery> <xquery> for $page in ... <xquery> </commands>
2. Use copy/transform/return or update to do all updates in main-memory [2] and pass them on to your file:write function. This could e.g. look as follows:
let $input := doc('input.xml') let $map := map:merge( for $part in $input//part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $page in $input//page let $new-part := $page update { for $partinfo in .//unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with element part { $part/node() } } return file:write('data/'|| $page/@key || '.xml', $page, map{ "omit-xml-declaration": "no" })
The good thing here is that your replace operations won’t need to be cached until the very end; they will directly be run on each page element (and not persisted on disk; but this is something you don’t need anyway it seems)
Christian
[1] http://docs.basex.org/wiki/Commands#Command_Scripts [2] http://docs.basex.org/wiki/XQuery_Update#Non-Updating_Expressions
Your example 2 does the job, thank you! Although I had to change the second-to-last line from
return file:write('data/'|| $page/@key || '.xml', $page,
to
return file:write('data/'|| $page/@key || '.xml', $new-part,
One last thing: if I want to specify the path to the input file on the command line, how do I then refer to it in the script? From the Command-Line Options page, it seems I can say
-i/my/path/to/input.xml
But what do I use in place of $input in that case? If I just remove "$input" from the script I get a "no context value bound" error if I say
-i/my/path/to/input.xml
or
-i/my/path/to/input.xml "/"
Thanks,
Michael
On Thu, May 26, 2016 at 11:21 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Michael,
- There'll be a new version of the input file arriving periodically, so
I'd
prefer to be able to do everything at the command line without having to create the database inside the GUI. Not sure of the best way to go about that.
Using the GUI was just an example. You can create new databases via commands (CREATE), XQuery (db:create) or our APIs. Please check out our documentation for more hints.
- In the GUI for now, I can perform the replace with the map, or I can
run
the for loop that writes out all the files. But I get an "Unexpected end
of
query" error when I try to do both, so what's the problem with my syntax?
In XQuery, multiple expressions can be separated with commas.
Note, however, that XQuery is a functional language; as such, it is not possible to first update items and then access them in the same query that easily. There are various alternatives to get around this limitation:
- Use a BaseX command script to run all operations [1]:
<commands> <create-db name='input'>...path/to/input.xml</create-db> <xquery><![CDATA[ let $map := ... ]]></xquery> <xquery> for $page in ... <xquery> </commands>
- Use copy/transform/return or update to do all updates in
main-memory [2] and pass them on to your file:write function. This could e.g. look as follows:
let $input := doc('input.xml') let $map := map:merge( for $part in $input//part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $page in $input//page let $new-part := $page update { for $partinfo in .//unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with element part { $part/node() } } return file:write('data/'|| $page/@key || '.xml', $page, map{ "omit-xml-declaration": "no" })
The good thing here is that your replace operations won’t need to be cached until the very end; they will directly be run on each page element (and not persisted on disk; but this is something you don’t need anyway it seems)
Christian
[1] http://docs.basex.org/wiki/Commands#Command_Scripts [2] http://docs.basex.org/wiki/XQuery_Update#Non-Updating_Expressions
If I just remove "$input" from the script I get a "no context value bound" error if I say
-i/my/path/to/input.xml
It should work as described (I frequently use it by myself). Could you try again, or provide me with a little self-contained example?
Cheers, Christian
On Thu, May 26, 2016 at 11:21 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Michael,
- There'll be a new version of the input file arriving periodically, so
I'd prefer to be able to do everything at the command line without having to create the database inside the GUI. Not sure of the best way to go about that.
Using the GUI was just an example. You can create new databases via commands (CREATE), XQuery (db:create) or our APIs. Please check out our documentation for more hints.
- In the GUI for now, I can perform the replace with the map, or I can
run the for loop that writes out all the files. But I get an "Unexpected end of query" error when I try to do both, so what's the problem with my syntax?
In XQuery, multiple expressions can be separated with commas.
Note, however, that XQuery is a functional language; as such, it is not possible to first update items and then access them in the same query that easily. There are various alternatives to get around this limitation:
- Use a BaseX command script to run all operations [1]:
<commands> <create-db name='input'>...path/to/input.xml</create-db> <xquery><![CDATA[ let $map := ... ]]></xquery> <xquery> for $page in ... <xquery> </commands>
- Use copy/transform/return or update to do all updates in
main-memory [2] and pass them on to your file:write function. This could e.g. look as follows:
let $input := doc('input.xml') let $map := map:merge( for $part in $input//part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $page in $input//page let $new-part := $page update { for $partinfo in .//unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with element part { $part/node() } } return file:write('data/'|| $page/@key || '.xml', $page, map{ "omit-xml-declaration": "no" })
The good thing here is that your replace operations won’t need to be cached until the very end; they will directly be run on each page element (and not persisted on disk; but this is something you don’t need anyway it seems)
Christian
[1] http://docs.basex.org/wiki/Commands#Command_Scripts [2] http://docs.basex.org/wiki/XQuery_Update#Non-Updating_Expressions
I've tried with a number of variations. As far as a self-contained example, I'm not exactly sure what you're looking for, but suppose I have in C:\data\input.xml:
<?xml version="1.0" encoding="UTF-8"?> <input> <pages> <page> <lbl>Sample Page</lbl> <list> <unlt> <lbl>Summer Unit</lbl> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> <offer>25% off!</offer> </unlt> </list> </page> </pages> <parts> <part> <partinfo> <number>54321</number> <manuf>A321</manuf> </partinfo> <color>Orange</color> </part> <part> <partinfo> <number>12345</number> <manuf>B123</manuf> </partinfo> <color>Pink</color> </part> </parts> </input>
And suppose I have in C:\scripts\query.xq:
let $map := map:merge( for $part in //part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $page in //page let $new-part := $page update { for $partinfo in .//unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with element part {$part/node()} } return file:write('/data/'|| $page/@key || '.xml', $new-part, map{"omit-xml-declaration":"no"})
And suppose my current directory is C:\scripts. If I launch the command:
"\Program Files (x86)\BaseX\bin\basex" query.xq -iC:/data/input.xml
I get in response:
Stopped at C:/scripts/query.xq, 2/19: [XPDY0002] root(): no context value bound.
It's the same whether I say
-iC:/data/input.xml -i/data/input.xml -iC:\data\input.xml -i\data\input.xml
Thanks,
Michael
On Mon, May 30, 2016 at 2:20 PM, Christian Grün christian.gruen@gmail.com wrote:
If I just remove "$input" from the script I get a "no context value bound" error if I say
-i/my/path/to/input.xml
It should work as described (I frequently use it by myself). Could you try again, or provide me with a little self-contained example?
Cheers, Christian
On Thu, May 26, 2016 at 11:21 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
Hi Michael,
- There'll be a new version of the input file arriving periodically,
so
I'd prefer to be able to do everything at the command line without having
to
create the database inside the GUI. Not sure of the best way to go
about
that.
Using the GUI was just an example. You can create new databases via commands (CREATE), XQuery (db:create) or our APIs. Please check out our documentation for more hints.
- In the GUI for now, I can perform the replace with the map, or I
can
run the for loop that writes out all the files. But I get an "Unexpected
end
of query" error when I try to do both, so what's the problem with my syntax?
In XQuery, multiple expressions can be separated with commas.
Note, however, that XQuery is a functional language; as such, it is not possible to first update items and then access them in the same query that easily. There are various alternatives to get around this limitation:
- Use a BaseX command script to run all operations [1]:
<commands> <create-db name='input'>...path/to/input.xml</create-db> <xquery><![CDATA[ let $map := ... ]]></xquery> <xquery> for $page in ... <xquery> </commands>
- Use copy/transform/return or update to do all updates in
main-memory [2] and pass them on to your file:write function. This could e.g. look as follows:
let $input := doc('input.xml') let $map := map:merge( for $part in $input//part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $page in $input//page let $new-part := $page update { for $partinfo in .//unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with element part { $part/node() } } return file:write('data/'|| $page/@key || '.xml', $page, map{ "omit-xml-declaration": "no" })
The good thing here is that your replace operations won’t need to be cached until the very end; they will directly be run on each page element (and not persisted on disk; but this is something you don’t need anyway it seems)
Christian
[1] http://docs.basex.org/wiki/Commands#Command_Scripts [2] http://docs.basex.org/wiki/XQuery_Update#Non-Updating_Expressions
"\Program Files (x86)\BaseX\bin\basex" query.xq -iC:/data/input.xml
This should work:
"\Program Files (x86)\BaseX\bin\basex" -iC:/data/input.xml query.xq
In BaseX, the order of arguments is important.
I get in response:
Stopped at C:/scripts/query.xq, 2/19: [XPDY0002] root(): no context value bound.
It's the same whether I say
-iC:/data/input.xml -i/data/input.xml -iC:\data\input.xml -i\data\input.xml
Thanks,
Michael
On Mon, May 30, 2016 at 2:20 PM, Christian Grün christian.gruen@gmail.com wrote:
If I just remove "$input" from the script I get a "no context value bound" error if I say
-i/my/path/to/input.xml
It should work as described (I frequently use it by myself). Could you try again, or provide me with a little self-contained example?
Cheers, Christian
On Thu, May 26, 2016 at 11:21 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Michael,
- There'll be a new version of the input file arriving periodically,
so I'd prefer to be able to do everything at the command line without having to create the database inside the GUI. Not sure of the best way to go about that.
Using the GUI was just an example. You can create new databases via commands (CREATE), XQuery (db:create) or our APIs. Please check out our documentation for more hints.
- In the GUI for now, I can perform the replace with the map, or I
can run the for loop that writes out all the files. But I get an "Unexpected end of query" error when I try to do both, so what's the problem with my syntax?
In XQuery, multiple expressions can be separated with commas.
Note, however, that XQuery is a functional language; as such, it is not possible to first update items and then access them in the same query that easily. There are various alternatives to get around this limitation:
- Use a BaseX command script to run all operations [1]:
<commands> <create-db name='input'>...path/to/input.xml</create-db> <xquery><![CDATA[ let $map := ... ]]></xquery> <xquery> for $page in ... <xquery> </commands>
- Use copy/transform/return or update to do all updates in
main-memory [2] and pass them on to your file:write function. This could e.g. look as follows:
let $input := doc('input.xml') let $map := map:merge( for $part in $input//part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $page in $input//page let $new-part := $page update { for $partinfo in .//unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with element part { $part/node() } } return file:write('data/'|| $page/@key || '.xml', $page, map{ "omit-xml-declaration": "no" })
The good thing here is that your replace operations won’t need to be cached until the very end; they will directly be run on each page element (and not persisted on disk; but this is something you don’t need anyway it seems)
Christian
[1] http://docs.basex.org/wiki/Commands#Command_Scripts [2] http://docs.basex.org/wiki/XQuery_Update#Non-Updating_Expressions
Indeed it does!
Christian, thank you so much for all your help. I've got everything I need, not only for this particular task, but I can probably extend what you've shown me to other related issues where BaseX can prove invaluable for saving time in my work processes. This has been a thoroughly satisfying experience.
Best regards,
Michael
On Mon, May 30, 2016 at 4:42 PM, Christian Grün christian.gruen@gmail.com wrote:
"\Program Files (x86)\BaseX\bin\basex" query.xq -iC:/data/input.xml
This should work:
"\Program Files (x86)\BaseX\bin\basex" -iC:/data/input.xml query.xq
In BaseX, the order of arguments is important.
I get in response:
Stopped at C:/scripts/query.xq, 2/19: [XPDY0002] root(): no context value bound.
It's the same whether I say
-iC:/data/input.xml -i/data/input.xml -iC:\data\input.xml -i\data\input.xml
Thanks,
Michael
On Mon, May 30, 2016 at 2:20 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
If I just remove "$input" from the script I get a "no context value bound" error if I say
-i/my/path/to/input.xml
It should work as described (I frequently use it by myself). Could you try again, or provide me with a little self-contained example?
Cheers, Christian
On Thu, May 26, 2016 at 11:21 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Michael,
- There'll be a new version of the input file arriving
periodically,
so I'd prefer to be able to do everything at the command line without
having
to create the database inside the GUI. Not sure of the best way to go about that.
Using the GUI was just an example. You can create new databases via commands (CREATE), XQuery (db:create) or our APIs. Please check out our documentation for more hints.
- In the GUI for now, I can perform the replace with the map, or I
can run the for loop that writes out all the files. But I get an
"Unexpected
end of query" error when I try to do both, so what's the problem with my syntax?
In XQuery, multiple expressions can be separated with commas.
Note, however, that XQuery is a functional language; as such, it is not possible to first update items and then access them in the same query that easily. There are various alternatives to get around this limitation:
- Use a BaseX command script to run all operations [1]:
<commands> <create-db name='input'>...path/to/input.xml</create-db> <xquery><![CDATA[ let $map := ... ]]></xquery> <xquery> for $page in ... <xquery> </commands>
- Use copy/transform/return or update to do all updates in
main-memory [2] and pass them on to your file:write function. This could e.g. look as follows:
let $input := doc('input.xml') let $map := map:merge( for $part in $input//part return map:entry(string-join($part/partinfo/*, '/'), $part) ) for $page in $input//page let $new-part := $page update { for $partinfo in .//unit/partinfo let $part := $map(string-join($partinfo/*, '/')) return replace node $partinfo with element part { $part/node() } } return file:write('data/'|| $page/@key || '.xml', $page, map{ "omit-xml-declaration": "no" })
The good thing here is that your replace operations won’t need to be cached until the very end; they will directly be run on each page element (and not persisted on disk; but this is something you don’t need anyway it seems)
Christian
http://docs.basex.org/wiki/XQuery_Update#Non-Updating_Expressions
basex-talk@mailman.uni-konstanz.de