But is there no way to declare that when I import a file to the database?
There's currently no way to supply this for specific elements – but it's a good thought, we should think about it, now that all whitespace are preserved by default.
What would you have done differently?
It's always easy to complain and much harder to improve things, even more if you can't start from scratch (I have never considered how SGML has handled this issue, or if it was an issue at all).
JSON doesn’t come with a standardized solution for mixed content, but it's impossible to corrupt the contents by using a wrong indentation.
Looking at the existing XML representation, I would certainly have preferred to have heading and trailing whitespaces in elements ignored unless an element is marked as mixed-content. Next, it would have been consistent to also have a xml:space='strip' attribute value instead of 'default'.
More fundamentally, a custom node type for mixed content could have been added, and structural and content-based data could have been represented differently. If JSON and XML was merged, for example, conventional JSON could be used to store non-hierarchical metadata, and the JSON value range could be extended by a simplified XML type for mixed content. A language could support both the JavaScript dot and the XML path syntax:
books.book[text//h1 = 'Survey'].title
Well, no more ideas here. Instead of inventing something new by myself, I should definitely have a look at other projects first, e.g. Ghislain’s exciting RumbleDB project, which brings many interesting concepts together.