Dear BaseX team, is this a bug? serialize(<e> </e>)
returns <e/> I need my blanks! Hans-Jürgen
On 08.09.2016, at 23:17, Hans-Juergen Rennau hrennau@yahoo.de wrote:
Dear BaseX team,
is this a bug?
serialize(<e> </e>)
returns
<e/>
I need my blanks!
Hans-Jürgen
Maybe your data is stored/processed with whitespaces stripped off?
Hmm first I thought not a bug and that the fix would be to do
serialize(<e xml:space="preserve"> </e>)
but to my surprise this results in
<e xml:space="preserve"/>
(with CHOP = false)
That doesn't seem right.
Whereas
serialize(<e><![CDATA[ ]]></e>)
does what you expect.
--Marc
It seems to be a problem with elements containing only whitespace characters:
serialize(<xml> <title> Demonstrating the CHOP flag </title> <text xml:space="preserve">To <b>be </b><i > </i >, or not to <b>be d asdf </b>, that is the question.</text> </xml>)
<b> Tags work as expected, but <i> Tags will be stripped to the empty element <i/>. Maybe this has to do with the handling of empty elements?
Sebastian
Am 09.09.2016 um 11:00 schrieb Alexander Holupirek alex@holupirek.de:
On 08.09.2016, at 23:17, Hans-Juergen Rennau hrennau@yahoo.de wrote:
Dear BaseX team,
is this a bug?
serialize(<e> </e>)
returns
<e/>
I need my blanks!
Hans-Jürgen
Maybe your data is stored/processed with whitespaces stripped off?
------------------------------------------------------- >>> business. people. technology. <<< -------------------------------------------------------
adesso AG mit Sitz in Dortmund Vorstand: Michael Kenfenheuer (Vors.), Christoph Junge, Andreas Prenneis Vorsitzender des Aufsichtsrates: Prof. Dr. Volker Gruhn Amtsgericht Dortmund HRB 20663
Thanks, Alexander and everybody for your remarks and observations. As it happens, in the meantime I've found the reason: it is the boundary-space policy, which can either be "preserve" or "strip" and which a boundary-space declaration of the prolog can explicitly choose (overriding the implementation-defined default) [1]. Therefore: declare boundary-space preserve; serialize(<e> </e>)
yiels <e> </e> as it should, hurray! One of so many opportunities to note the high quality of BaseX which honours such a rather obscure declaration. The XQuery spec also states explicitly that xml:space has no effect in this context (from [2]):"Element constructors treat attributes namedxml:space as ordinary attributes. Anxml:space attribute does not affect the handling ofwhitespace by an element constructor." So BaseX is doing just the right thing.
Everything is fine! Cheers,Hans-Jürgen PS: For the interest, here the definition of boundary-whitespace ([2]):[Definition: Boundarywhitespace is a sequence of consecutive whitespace characterswithin the content of a direct element constructor, that isdelimited at each end either by the start or end of the content, orby a DirectConstructor, or by anEnclosedExpr. For thispurpose, characters generated by character references such as  or by CDataSections are not consideredto be whitespace characters.] [1] https://www.w3.org/TR/xquery-31/#id-boundary-space-decls%5B2] https://www.w3.org/TR/xquery-31/#id-whitespace
Alexander Holupirek alex@holupirek.de schrieb am 11:00 Freitag, 9.September 2016:
On 08.09.2016, at 23:17, Hans-Juergen Rennau hrennau@yahoo.de wrote:
Dear BaseX team,
is this a bug?
serialize(<e> </e>)
returns
<e/>
I need my blanks!
Hans-Jürgen
Maybe your data is stored/processed with whitespaces stripped off?
That clears up some things. However I wonder what the right interpretation is regarding xml:space. Doesn't this belong to parsing XML and is this then not dealt with by the parser before XQuery gets it. The XQuery spec talks specifically about a element constructor. I wouldn't have thought that this would count as using the element constructor and the XML nodes could've come directly from a document. My hunch is that xml:space should be honoured but I'm not a spec nerd ;-)
--Marc
On Fri, Sep 9, 2016 at 1:23 PM, Hans-Juergen Rennau hrennau@yahoo.de wrote:
Thanks, Alexander and everybody for your remarks and observations.
As it happens, in the meantime I've found the reason: it is the boundary-space policy, which can either be "preserve" or "strip" and which a boundary-space declaration of the prolog can explicitly choose (overriding the implementation-defined default) [1]. Therefore:
declare boundary-space preserve; serialize(<e> </e>)
yiels
<e> </e>
as it should, hurray! One of so many opportunities to note the high quality of BaseX which honours such a rather obscure declaration.
The XQuery spec also states explicitly that xml:space has no effect in this context (from [2]): "Element constructors treat attributes named xml:space as ordinary attributes. An xml:space attribute does not affect the handling of whitespace by an element constructor."
So BaseX is doing just the right thing.
Everything is fine!
Cheers, Hans-Jürgen
PS: For the interest, here the definition of boundary-whitespace ([2]): [Definition: Boundary whitespace is a sequence of consecutive whitespace characters within the content of a direct element constructor, that is delimited at each end either by the start or end of the content, or by a DirectConstructor, or by an EnclosedExpr. For this purpose, characters generated by character references such as   or by CDataSections are not considered to be whitespace characters.]
[1] https://www.w3.org/TR/xquery-31/#id-boundary-space-decls [2] https://www.w3.org/TR/xquery-31/#id-whitespace
Alexander Holupirek alex@holupirek.de schrieb am 11:00 Freitag, 9.September 2016:
On 08.09.2016, at 23:17, Hans-Juergen Rennau hrennau@yahoo.de wrote:
Dear BaseX team,
is this a bug?
serialize(<e> </e>)
returns
<e/>
I need my blanks!
Hans-Jürgen
Maybe your data is stored/processed with whitespaces stripped off?
... just checked. When the node is from a document instance than serialize does keep whitespace (with or without xml:space).
And this clears up a confusion I had about what an element constructor is and what a direct element constructor is.
For fun I tried serialize(element e { ' ' }) which happily returns <e> </e>
Somebody wants to write that book on XML/XSLT/XQuery and whitespace? Title suggestion: "Whitespace matters"
--Marc
basex-talk@mailman.uni-konstanz.de