On 2012-05-11 23:34, Michael Piotrowski wrote:
Frankly, I find it quite dangerous that CHOP is ON by default. Discarding whitespace in mixed content means losing information. I'd find it preferable if it were off by default; if you know your data and if you are aware of the effects of CHOP, *then* you could turn it on.
+1
Whitespace should always be preserved, unless the parser strips it because it knows that it’s ignorable whitespace (because it has been made aware of a schema?).
Looking at Saxon’s strip option, http://www.saxonica.com/documentation/javadoc/net/sf/saxon/s9api/WhitespaceS...
Saxon’s -strip:ignorable has a certain appeal, but when you consider how “ignorable” is specified:
The value IGNORABLE indicates that whitespace text nodes in element-only content are discarded.
fringe cases instantly come to your mind where it will strip a whitespace too much: <p><hi rend="bold">Hello</hi> <hi rend="bolditalic">World</hi></p>
“element-only content:” I’m not sure whether Saxon decides that it is in element-only content based upon information from the parser (“no text node allowed here”) or whether it draws its own conclusions (“no text node present here”).
Hmm.