On Fri, 2022-11-18 at 18:39 +0000, Lizzi, Vincent wrote:
Hi Liam,
XML's way handling of space characters is understandably an improvement over SGML, but it still causes problems sometimes and seems more complex than it perhaps could be. Although the ship has long since sailed, out of curiosity do you recall if there were any suggestions for a rule to ensure that spaces (and absence of spaces) would be consistently preserved without relying on a DTD or Schema?
There were. There was a lot of discussion around this. The main proposals were (1) disallow mixed content entirely, and require an element to contain text. <p><T>Karen </T><emph>actually</em><T> smiled at this idea.</T?</p> It's easy to see why this didn't get much traction from document people.
(2) require mixed or text elements to use different syntax, e.g. <@p>Karen <@emph>actually/@emph smiled at this idea./@p This would have ruled out XHTML, however, or any other pre-existing SGML vocabulary, and at that time that was 100% of all content: there was no XML content outside of the examples in the specification itself.
At one point i remember (foolishly) suggesting upper-case element names for ones that count not contain text directly (or the other way round, i forget), but of course this wouldn't work in a multilingual world where not all languages have upper and lower case.
XML was developed before XML Schema. When we started, a DTD was required; by the end, DTDs were optional (i had Charles Goldfarb calling me at home over this, trying to find ways to keep DTDs as mandatory!) but i think we didn't revisit all of the decisions in this light.
A relatively safe way to "pretty print" indent XML is to only insert or remove spaces between an element's name and closing > and where spaces already exist in text nodes.
Yes, there are tools that can do this, too.
However I haven't seen any XML editor or processor implement this approach.
I think maybe xmllint can, i'm not sure. And possibly xml tidy, and maybe James' xp has something like this. Overall i think it tends to confuse people more than it helps, though. I'm not sure.
liam