Note that the XML mixed content and whitespace design was inherited from SGML, where DTDs were required, and so a parser always knew for sure whether a given context was or was not mixed content.

It’s been a couple decades, but my memory is that anything we did in XML to address this in the face of not requiring any kind of grammar would have been even more disruptive, such as not allowing mixed content at all and having some special syntax just for identifying text nodes.

So it wasn’t really a decision so much as there not really being a better solution in the context of SGML as our starting point.

Cheers,

_____________________________________________

Eliot Kimber

Sr Staff Content Engineer

O: 512 554 9368

M: 512 554 9368

servicenow.com

LinkedIn | Twitter | YouTube | Facebook

From: BaseX-Talk <basex-talk-bounces@mailman.uni-konstanz.de> on behalf of Christian Grün <christian.gruen@gmail.com>
Date: Thursday, November 17, 2022 at 11:01 AM
To: Jonathan Robie <jonathan.robie@gmail.com>
Cc: basex-talk@mailman.uni-konstanz.de <basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] Pretty print

[External Email]

But the indentation is quite different from what I see in Saxon or oXygen output when I indent. You see this with more complex examples.

That’s true, every query processor uses custom indentation algorithms; the specification gives much freedom here [1]. If indentation is important, it’s always recommendable to either preserve the original formatting or use xml:space='preserve' for mixed-context sections.

I’ll never be happy with the decision in XML to lump together indentation of structure and content.

[1] https://www.w3.org/TR/xslt-xquery-serialization-31/#xml-indent