On Jun 18, 2016, at 6:35 AM, Christian GrĂ¼n wrote:
Dear Michael,
As you correctly guessed, if you want to preserve whitespaces, you will need to set the CHOP option to false. I remember there has been discussion around this option on this list more than once.
Yes. What puzzles me is that calling db:replace with a fourth argument of map { "chop" : false() } appears not to have any effect in the database in question. (I still have not put together a minimal reproducible example; for the moment I solved the problem by adding xml:space="preserve" to each mixed-content element. I hope to come back to the issue of the chop option when my current rush is past.)
By the way, I never stopped wondering why only 'preserve' and 'default' are allowed as values for the xml:space attribute. As one of the renowned editors of the spec, can you tell why a 'strip' value was omitted back then?
The short answer is no, I cannot. (My prayers have been answered! There are some details of the design discussions of 1996 that I cannot remember!)
I have just spent much more time than I intended trying to find the relevant parts of the discussion in the email archive at
http://lists.w3.org/Archives/Public/w3c-sgml-wg/
Before being named 'xml:space', the attribute in question appears to have gone by the name 'xml-space' or '-xml-space' (as the group's attempts to reserve a portion of the namespace for itself changed over time). As far as I can tell, the discussions on whitespace handling began in September 1996 and may have been mostly concluded by December of that year.
A document containing the group's summaries of design decisions mentions (what became) the xml:space attribute in decisions of 29 October and again on 18 December, and again on 4 June 1997.
http://www.w3.org/XML/9712-reports.html
The two values were labeled 'KEEP' and 'COLLAPSE' in the draft of 14 November 1996 (which appears to be the oldest one in the W3C technical-reports area); 'COLLAPSE' was later renamed to 'DEFAULT'. Later proposals to introduce a third value with a name like REMOVE or DISCARD did come up, but appear never to have gotten any traction.
http://www.w3.org/TR/WD-xml-961114#sec2.7
Speaking for myself, I think a better heuristic than dropping all whitespace-only text nodes and removing leading and trailing whitespace would be dropping whitespace-only text nodes only if every text-node seen so far as a child of this parent has been whitespace-only, and stripping leading whitespace only after a start-tag and trailing whitespace only after an end-tag.
The first would prevent the loss of the inter-word whitespace in
<p>This <em>is</em> <strong>IMPORTANT</strong></p>
But it may be as impossible for BaseX to change the details of the CHOP option as is is to change the default value for the CHOP option from true to false.
Please note that 'chop' in combination with db:create will only get effective if you specify actual input with this command [2].
Thank you; I had not realized that (I imagined it was somehow setting a default for the collection being created).
If you want to globally deactivated whitespace chopping, you can specify this option in the .basex configuration file or (if you are working with RESTXQ, REST, etc.), add it in the web.xml file.
Aha. That may be the thing to do.
Hope this helps,
As always, it does. Thank you very much.
Michael