On Mon, 2019-12-09 at 20:27 +0100, Arjan Loeffen wrote:
In general: when the wiki states here: "Many XML documents include whitespaces that have been added to improve readability. ", this should not apply to mixed content fragments as described. Only to start and end of "text content of elements", not on text nodes. I therefore also think that the second approach is not exactly in line with the *intention *of the XML standard.
It isn't, but some of the earliest XML parsers had the option to drop white-space-only text nodes (e.g. MSXML i think) because of XML used in data contexts. The intent was that a DTD could be used to determine which spaces to ignore, but then DTDs became optional.
A parser without a DTD does not know which elements _could_ contain text, and hence doesn't know what to drop. In addition, markup like,
<person> <name> Nigel </name> <obedience> 0.4 </obedience> </person>
is common, unfortunately. In SGML this worked but the whitespace rules were complex enough that were a constant source of trouble.
Liam