On Thu, 2022-11-17 at 19:05 +0100, Christian Grün wrote:
But is there no way to declare that when I import a file to the database?
There's currently no way to supply this for specific elements
Both XML Schema and DTDs do have a way to say whether text is allowed in a particular context, and the XML loader could use this information to discard whitespace text nodes that aren't text.
On how it came to be -
SGML had some really bad whitespace rules, including what was called "pernicious whitespace" - whitespace where the parser needed backtracking to know if was text or not, but the parsers didn't actually do backtracking so they flagged it as an error. This was a very common source of problems for users.
We eliminated this for XML by requiring #PCDATA (i.e. text) always to be in a repeatable or-group, so <!ELEMENT boy (noise|dirt|#PCDATA)*> and not <!ELEMENT boy (noise*, dirt*, #PCDATA)> (to paraphrase Ambrose Beirce's Devil's Dictionary, which defined a boy as a noise with dirt on it).
liam