These vulnerabilities are only an issue if you allow untrusted users to supply XML documents with DTDs.
If your system must allow users to submit XML documents with DTDs, then you probably want to pre-parse them before supplying them to BaseX, i.e., using a Java parser or Python with lxml or similar, where the entity-related vulnerabilities can be prevented or
isolated. That is, your site can provide an upload target that preprocesses XML documents in order to sanitize them before submitting to BaseX.
One limitation I’ve run into with BaseX’s built-in parser is that it does not implement use of Apache’s grammar cache feature, which makes it very inefficient for documents with large DTDs, like DITA documents.
My solution is to simply not use DTD-aware parsing, which works for DITA because we know what all the default attribute values are for a given tag name and are not dependent on any other DTD-specific feature (i.e., DITA doesn’t use external general entities
for any defined purpose, like references to images or something).
Cheers,
E.
_____________________________________________
Eliot Kimber
Sr. Staff Content Engineer
O: 512 554 9368
servicenow
LinkedIn | X | YouTube | Instagram