Re: [basex-talk] Weird: mixed content trimmed unexpectedly

9 Dec 2019


      On Mon, 2019-12-09 at 20:27 +0100, Arjan Loeffen wrote:
...
In general: when the wiki states here: "Many XML documents include
whitespaces that have been added to improve readability. ", this
should not
apply to mixed content fragments as described. Only to start and end
of
"text content of elements", not on text nodes.
I therefore also think that the second approach is not exactly in
line with
the *intention *of the XML standard.
It isn't, but some of the earliest XML parsers had the option to drop
white-space-only text nodes (e.g. MSXML i think) because of XML used in
data contexts. The intent was that a DTD could be used to determine
which spaces to ignore, but then DTDs became optional.
A parser without a DTD does not know which elements _could_ contain
text, and hence doesn't know what to drop. In addition, markup like,
<person>
    <name>
       Nigel
    </name>
    <obedience>
       0.4
    </obedience>
  </person>
is common, unfortunately. In SGML this worked but the whitespace rules
were complex enough that were a constant source of trouble.
Liam
-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Weird: mixed content trimmed unexpectedly