Hello Michael,
You are certainly right that with mixed content and the example you have given here chopping does make a semantic difference. However, you can disable this behaviour so BaseX does what you want. So the only reason I see why one should change the default behaviour would be because the default is not confirmant to some XML standard. However, I can not find any specifics in the spec about which is the expected behaviour, so in my opinion BaseX is doing nothing wrong here. I see that this behaviour might be surprising for some users, but this might as well be the case if it were the other way round. Additionally, if we would change this now it would break application code and unless there is a good reason (i.e. BaseX is actually doing something wrong or non-compliant) I don't see why one should change the default. So if you could point out some details as why this is not conforming behaviour, this would be interesting.
Cheers, Dirk
On Fri, Apr 5, 2013 at 11:15 AM, Michael Piotrowski mxp@cl.uzh.ch wrote:
On 2013-04-05, Michael Seiferle ms@basex.org wrote:
As chopping does not change any semantics (at least with regards to what XML thinks of semantically important) but only aesthetics this is enabled by default.
I'm sorry to disagree, but chopping certainly *does* change the semantics--that's precisely why I've argued before that it shouldn't be on by default.
The problem becomes obvious with mixed content, e.g., with chopping enabled
<doc> <p>Lorem ipsum <em>dolor</em> <x>sit</x> amet ...</p> </doc>
becomes
<doc> <p>Lorem ipsum<em>dolor</em><x>sit</x>amet ...</p> </doc>
which is *not* the same, and AFAIKT this is not conforming behavior (and BaseX doesn't honor xml:space either).
I do understand that whitespace chopping as currently implemented is useful for some data-oriented applications, even if it is not conforming, but by default, the behavior should conform to the XML standard.
Best regards
-- Dr.-Ing. Michael Piotrowski, M.A. mxp@cl.uzh.ch Institute of Computational Linguistics, University of Zurich Phone +41 44 63-54313 | OpenPGP public key ID 0x1614A044
- OUT NOW: Natural Language Processing for Historical Texts
- http://morganclaypool.com/doi/abs/10.2200/S00436ED1V01Y201207HLT017
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk