Dear BaseX,
according to github instructions I’m sending a bug report via email:
Using BaseX 9.5.1, cmd-line “basex.bat -i datafile.xml queryfile.xq” leads to no result at all. I’m convinced there should be result equivalent to the one of the online xquery tester (https://www.videlibri.de/cgi-bin/xidelcgi)
For a modified query using string comparison based on fn:path(), BaseX gives the expected result.
%%% query begin %%%
let $rootElement := /child::*
for $x in //Section[parent::* != $rootElement]
return fn:path($x)
%%% query end %%%
%%% data begin %%%
<root>
<Section id="should not be reported 1" />
<Section id="should not be reported 2">
<Section id="should be reported 1"/>
<a>
<Section id="should be reported 2"/>
</a>
</Section>
<a>
<Section id="should be reported 3"/>
</a>
</root>
%%% data end %%%
%%% modified query begin %%%
let $rootElementPath := fn:path(/child::*)
for $x in //Section[fn:path(parent::*) != $rootElementPath]
return fn:path($x)
%%% query end %%%
Kind regards
Jan Červák
Am 26.05.2021 um 11:03 schrieb ydyxeb@post.cz:
Dear BaseX,
according to github instructions I’m sending a bug report via email:
Using BaseX 9.5.1, cmd-line “basex.bat -i datafile.xml queryfile.xq” leads to no result at all. I’m convinced there should be result equivalent to the one of the online xquery tester (https://www.videlibri.de/cgi-bin/xidelcgi https://www.videlibri.de/cgi-bin/xidelcgi)
For a modified query using string comparison based on fn:path(), BaseX gives the expected result.
%%% query begin %%%
let $rootElement := /child::*
for $x in //Section[parent::* != $rootElement]
Are you sure you want the string comparison of the two elements? Or rather node identity
for $x in //Section[not(parent::* is $rootElement)]
?
return fn:path($x)
%%% query end %%%
%%% data begin %%%
<root>
<Section id="should not be reported 1" />
<Section id="should not be reported 2">
<Section id="should be reported 1"/>
<a>
<Section id="should be reported 2"/>
</a>
</Section>
<a>
<Section id="should be reported 3"/>
</a>
</root>
%%% data end %%%
%%% modified query begin %%%
let $rootElementPath := fn:path(/child::*)
for $x in //Section[fn:path(parent::*) != $rootElementPath]
return fn:path($x)
%%% query end %%%
Kind regards
Jan Červák
Hi Jan,
Thanks for your feedback to the mailing list.
Your query will return the expected paths if you call BaseX as follows:
basex.bat -w -i datafile.xml queryfile.xq
A short explanation: By default, BaseX ignores whitespaces when parsing XML documents. By specifying -w, whitespace chopping can be suppresed [1,2]. If you want to permanently disable whitespace chopping, you can achieve that by adding "CHOP=false" option in your .basex configuration file [2]
A more comprehensive explanation (just ignore it if you are aware of all the details): In your query, you are checking if the string value of a parent element is identical to the string value of $rootElement. The following comparison is equivalent to yours:
//Section[parent::*/data() != $rootElement/data()]
This is the reason why it matters if the whitespace-only nodes will be parsed or not. – If your actual objective is to compare the XML structure, you could the following:
//Section[not(deep-equal(parent::*, $rootElement))]
The following solution compares node identities:
//Section[not(parent::* is $rootElement)]
Hope this helps Christian
[1] https://docs.basex.org/wiki/Command-Line_Options [2] https://docs.basex.org/wiki/Configuration
Using BaseX 9.5.1, cmd-line “basex.bat -i datafile.xml queryfile.xq” leads to no result at all. I’m convinced there should be result equivalent to the one of the online xquery tester (https://www.videlibri.de/cgi-bin/xidelcgi)
For a modified query using string comparison based on fn:path(), BaseX gives the expected result.
%%% query begin %%%
let $rootElement := /child::*
for $x in //Section[parent::* != $rootElement]
return fn:path($x)
%%% query end %%%
%%% data begin %%%
<root>
<Section id="should not be reported 1" /> <Section id="should not be reported 2"> <Section id="should be reported 1"/> <a> <Section id="should be reported 2"/> </a> </Section> <a> <Section id="should be reported 3"/> </a>
</root>
%%% data end %%%
%%% modified query begin %%%
let $rootElementPath := fn:path(/child::*)
for $x in //Section[fn:path(parent::*) != $rootElementPath]
return fn:path($x)
%%% query end %%%
Kind regards
Jan Červák
Hi Christian,
thank you for the deep explanation. I've just got to use XQuery recently; my knowledge is primarily based on XPath 1.0 acquired 15y ago. I understand the underlying string value comparison of nodes/node sets. Even though performance was not my goal, thank you for the suggestion of 'is operator'.
However, I still don't understand, why the -w parameter should do the trick. I would expect the parsing is done once on the document load and $rootElement variable is just holding a document's node. Then string value of either $rootElement and parent::* should be identical for identical nodes. Do I miss something?
Thank you, Jan
-----Original Message----- From: Christian Grün christian.gruen@gmail.com Sent: Wednesday, May 26, 2021 11:41 To: ydyxeb@post.cz Cc: BaseX basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BUG node comparison
Hi Jan,
Thanks for your feedback to the mailing list.
Your query will return the expected paths if you call BaseX as follows:
basex.bat -w -i datafile.xml queryfile.xq
A short explanation: By default, BaseX ignores whitespaces when parsing XML documents. By specifying -w, whitespace chopping can be suppresed [1,2]. If you want to permanently disable whitespace chopping, you can achieve that by adding "CHOP=false" option in your .basex configuration file [2]
A more comprehensive explanation (just ignore it if you are aware of all the details): In your query, you are checking if the string value of a parent element is identical to the string value of $rootElement. The following comparison is equivalent to yours:
//Section[parent::*/data() != $rootElement/data()]
This is the reason why it matters if the whitespace-only nodes will be parsed or not. – If your actual objective is to compare the XML structure, you could the following:
//Section[not(deep-equal(parent::*, $rootElement))]
The following solution compares node identities:
//Section[not(parent::* is $rootElement)]
Hope this helps Christian
[1] https://docs.basex.org/wiki/Command-Line_Options [2] https://docs.basex.org/wiki/Configuration
Using BaseX 9.5.1, cmd-line “basex.bat -i datafile.xml queryfile.xq” leads to no result at all. I’m convinced there should be result equivalent to the one of the online xquery tester (https://www.videlibri.de/cgi-bin/xidelcgi)
For a modified query using string comparison based on fn:path(), BaseX gives the expected result.
%%% query begin %%%
let $rootElement := /child::*
for $x in //Section[parent::* != $rootElement]
return fn:path($x)
%%% query end %%%
%%% data begin %%%
<root>
<Section id="should not be reported 1" /> <Section id="should not be reported 2"> <Section id="should be reported 1"/> <a> <Section id="should be
reported 2"/>
</a> </Section> <a> <Section id="should be reported 3"/> </a>
</root>
%%% data end %%%
%%% modified query begin %%%
let $rootElementPath := fn:path(/child::*)
for $x in //Section[fn:path(parent::*) != $rootElementPath]
return fn:path($x)
%%% query end %%%
Kind regards
Jan Červák
Hi Jan,
However, I still don't understand, why the -w parameter should do the trick. I would expect the parsing is done once on the document load and $rootElement variable is just holding a document's node. Then string value of either $rootElement and parent::* should be identical for identical nodes. Do I miss something?
Yes, the parsing is done only once, and $rootElement will contain the root element of this document. Let’s assume we have this input document:
<a> <b> <c/> </b> </a>
If -w is omitted, the whitespaces (4 text nodes, in the example) will be dropped, and we’ll be left with the plain element structure:
<a><b><c/></b></a>
Let’s now assume we have this query:
//c[.. != /a]
The path returns the element named "c" if the string value of its parent differs from the string value of "a": • If the whitespaces of the document are preserved, the comparison will be successful, because "a" has the string value "\n \n \n \n", whereas "b" has the string value "\n \n ". • If whitespaces are dropped, the comparison will return false, because the string values of both elements will be the empty string.
Does this make sense to you? Christian
Thank you very much Christian! (my mistake for confusing equality and inequality in consideration of your previous response) Kind regards Jan
-----Original Message----- From: Christian Grün christian.gruen@gmail.com Sent: Wednesday, May 26, 2021 13:28 To: ydyxeb@post.cz Cc: BaseX basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] BUG node comparison
Hi Jan,
However, I still don't understand, why the -w parameter should do the trick. I would expect the parsing is done once on the document load and $rootElement variable is just holding a document's node. Then string value of either $rootElement and parent::* should be identical for identical nodes. Do I miss something?
Yes, the parsing is done only once, and $rootElement will contain the root element of this document. Let’s assume we have this input document:
<a> <b> <c/> </b> </a>
If -w is omitted, the whitespaces (4 text nodes, in the example) will be dropped, and we’ll be left with the plain element structure:
<a><b><c/></b></a>
Let’s now assume we have this query:
//c[.. != /a]
The path returns the element named "c" if the string value of its parent differs from the string value of "a": • If the whitespaces of the document are preserved, the comparison will be successful, because "a" has the string value "\n \n \n \n", whereas "b" has the string value "\n \n ". • If whitespaces are dropped, the comparison will return false, because the string values of both elements will be the empty string.
Does this make sense to you? Christian
basex-talk@mailman.uni-konstanz.de