I am trying to protect my BaseX application from XML vulnerabilities, like the ones described in [https://gist.github.com/mgeeky/4f726d3b374f0a34267d4f19c9004870] and [https://learn.microsoft.com/en-us/archive/msdn-magazine/2009/november/xml-de...].
My application runs as `basexhttp` inside a docker container, and I set the options in web.xml:
<context-param> <param-name>org.basex.dtd</param-name> <param-value>false</param-value> </context-param> <context-param> <param-name>org.basex.xinclude</param-name> <param-value>false</param-value> </context-param>
I have not found other options, for example to let the parser limit expansion of internal entities. Is there a way to set parser properties like `jdk.xml.entityExpansionLimit` in BaseX?
These vulnerabilities are only an issue if you allow untrusted users to supply XML documents with DTDs.
If your system must allow users to submit XML documents with DTDs, then you probably want to pre-parse them before supplying them to BaseX, i.e., using a Java parser or Python with lxml or similar, where the entity-related vulnerabilities can be prevented or isolated. That is, your site can provide an upload target that preprocesses XML documents in order to sanitize them before submitting to BaseX.
One limitation I’ve run into with BaseX’s built-in parser is that it does not implement use of Apache’s grammar cache feature, which makes it very inefficient for documents with large DTDs, like DITA documents.
My solution is to simply not use DTD-aware parsing, which works for DITA because we know what all the default attribute values are for a given tag name and are not dependent on any other DTD-specific feature (i.e., DITA doesn’t use external general entities for any defined purpose, like references to images or something).
Cheers,
E.
_____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
From: Nico Verwer (Rakensi) nverwer@rakensi.com Date: Thursday, March 13, 2025 at 5:26 PM To: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Subject: [basex-talk] Protecting against XML vulnerabilities [External Email]
________________________________ I am trying to protect my BaseX application from XML vulnerabilities, like the ones described in [[cid:part1.C3XloVPk.mrM90CZN@rakensi.com]https://gist.github.com/mgeeky/4f726d3b374f0a34267d4f19c9004870https://gist.github.com/mgeeky/4f726d3b374f0a34267d4f19c9004870] and [[cid:part2.1u0Y0Abz.C9LBjvp8@rakensi.com]https://learn.microsoft.com/en-us/archive/msdn-magazine/2009/november/xml-de...https://learn.microsoft.com/en-us/archive/msdn-magazine/2009/november/xml-denial-of-service-attacks-and-defenses].
My application runs as `basexhttp` inside a docker container, and I set the options in web.xml: <context-param> <param-name>org.basex.dtd</param-name> <param-value>false</param-value> </context-param> <context-param> <param-name>org.basex.xinclude</param-name> <param-value>false</param-value> </context-param>
I have not found other options, for example to let the parser limit expansion of internal entities. Is there a way to set parser properties like `jdk.xml.entityExpansionLimit[cid:part3.XDNlJJl7.oyzsfLMY@rakensi.com]` in BaseX?
Thank you, Eliot Kimber for your response:
These vulnerabilities are only an issue if you allow untrusted users to supply XML documents with DTDs.
My application will be open to the outer world, so there will be untrusted users. We do not use DTDs, but DTDs are just one vulnerability.
[...] pre-parse them before supplying them to BaseX,
My solution is to simply not use DTD-aware parsing, [...]
I am using the internal parser with the DTD option set to false, but this is still vulnerable to the one billion laughs attack.
My next action will be to try to install my own parser into BaseX, which will be an interesting exercise...
Hi Nico,
Is there a way to set parser properties like `jdk.xml.entityExpansionLimit`
in BaseX?
By default, more recent versions of the JDK have static entity expansion limits. Maybe those are not strict enough? Do you have an example at hand that causes problems?
I am using the internal parser with the DTD option set to false, but this
is still vulnerable to the one billion laughs attack.
Thanks for the hint. I have improved the entity expansion checks in our internal XML parser [1]. If you find an example that will not be caught by our (very simple) heuristics, feel free to share it with us.
I agree with Eliot that it can be hazardous to process arbitrary external contents (you are probably aware of that, too). Good firewall/proxy settings may be able to tackle some of the issues that will not be handled during XML parsing.
And @Eliot, with regard to caching: Have you played around with the XML Catalog feature?
Hope this helps, Christian
[1] https://files.basex.org/releases/latest/
On Fri, Mar 14, 2025 at 11:12 AM Nico Verwer (Rakensi) nverwer@rakensi.com wrote:
Thank you, Eliot Kimber for your response:
These vulnerabilities are only an issue if you allow untrusted users to supply XML documents with DTDs.
My application will be open to the outer world, so there will be untrusted users. We do not use DTDs, but DTDs are just one vulnerability.
[...] pre-parse them before supplying them to BaseX,
My solution is to simply not use DTD-aware parsing, [...]
I am using the internal parser with the DTD option set to false, but this is still vulnerable to the one billion laughs attack.
My next action will be to try to install my own parser into BaseX, which will be an interesting exercise...
Using the catalog feature does not automatically use the grammar cache—that has to be explicitly enabled as part of the parser configuration—it’s something I’ve done in the past for e.g., DITA Open Toolkit. I tried to do it for the BaseX parser but ran into some road block and didn’t have time motivation to push on it harder since I already had code that can supply the DITA-defined attribute default values in XQuery code when we need them.
For comparison: with our 60K DITA doc set, it takes about 2 minutes to load without DTD processing, 2 hours with, because of the DTD processing overhead. With grammar cache implemented it would probably be less than 3 minutes to load everything with DTDs.
In my case, it was easier to just not use DTDs then fix the underlying Java code. I suspect that for the vast majority of BaseX users, DTDs are either not an option at all or their DTDs are not the monster that the DITA DTDs are.
For incoming docs, if you’re turning DTD processing off you still have to strip out the DOCTYPE declarations as the parser is still obligated to resolve entity references, which is part of my motivation for a pre-processor.
The latest versions of libxmxl2 and the Python lxml library have very strict controls, making it safe to use them to sanitize incoming docs. I’m not sure how Java parsers compare because I haven’t had to worry about it in a Java context (it’s actually a problem for us that the Python and libxml2 is so strict because the DITA DTDs exceed the default limits and can’t be processed with lxml after v 4.9.4 ☹, which is why I’m familiar with their implementation of entity expansion limits).
Cheers,
E.
_____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368
servicenow
servicenow.comhttps://www.servicenow.com LinkedInhttps://www.linkedin.com/company/servicenow | Xhttps://twitter.com/servicenow | YouTubehttps://www.youtube.com/user/servicenowinc | Instagramhttps://www.instagram.com/servicenow
From: Christian Grün christian.gruen@gmail.com Date: Friday, March 14, 2025 at 9:39 AM To: Nico Verwer (Rakensi) nverwer@rakensi.com Cc: basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Subject: [basex-talk] Re: Protecting against XML vulnerabilities [External Email]
________________________________ Hi Nico, >Is there a way to set parser properties like `jdk.xml.entityExpansionLimit` in BaseX? By default, more recent versions of the JDK have static entity expansion limits. Maybe those are not strict enough?Do you have an example at hand that causes problems? > I am usi i This message needs your attention
* Someone new is on this email.
Provided by ServiceNow DT (Employee Portal KB0077950) - This banner is visible only to ServiceNow employees. CGBANNERINDICATOR Hi Nico,
Is there a way to set parser properties like `jdk.xml.entityExpansionLimit[cid:part3.XDNlJJl7.oyzsfLMY@rakensi.com]` in BaseX?
By default, more recent versions of the JDK have static entity expansion limits. Maybe those are not strict enough? Do you have an example at hand that causes problems?
I am using the internal parser with the DTD option set to false, but this is still vulnerable to the one billion laughs attack.
Thanks for the hint. I have improved the entity expansion checks in our internal XML parser [1]. If you find an example that will not be caught by our (very simple) heuristics, feel free to share it with us.
I agree with Eliot that it can be hazardous to process arbitrary external contents (you are probably aware of that, too). Good firewall/proxy settings may be able to tackle some of the issues that will not be handled during XML parsing.
And @Eliot, with regard to caching: Have you played around with the XML Catalog feature?
Hope this helps, Christian
[1] https://files.basex.org/releases/latest/https://files.basex.org/releases/latest/
On Fri, Mar 14, 2025 at 11:12 AM Nico Verwer (Rakensi) <nverwer@rakensi.commailto:nverwer@rakensi.com> wrote: Thank you, Eliot Kimber for your response: These vulnerabilities are only an issue if you allow untrusted users to supply XML documents with DTDs.
My application will be open to the outer world, so there will be untrusted users. We do not use DTDs, but DTDs are just one vulnerability.
[...] pre-parse them before supplying them to BaseX,
My solution is to simply not use DTD-aware parsing, [...]
I am using the internal parser with the DTD option set to false, but this is still vulnerable to the one billion laughs attack.
My next action will be to try to install my own parser into BaseX, which will be an interesting exercise...
Thank you very much, Christian!
I am using the internal parser with the DTD option set to false, but
this is still vulnerable to the one billion laughs attack.
Thanks for the hint. I have improved the entity expansion checks in our internal XML parser [1].
In BaseX 11.5, the billion laughs [https://gist.github.com/mgeeky/4f726d3b374f0a34267d4f19c9004870] ran for a long time, and gave me "java.lang.ArrayIndexOutOfBoundsException: Maximum array size reached." The latest release says: "Entities: expansion limit exceeded or recursive definitions found." No more billion laughs!
I was working on an extra option to set `XMLConstants.FEATURE_SECURE_PROCESSING` to `true`, because I used that in the project that I am rewriting. This option is used to "set limits on XML constructs to avoid conditions such as denial of service attacks." With your recent changes, I think this is no longer needed.
If you find an example that will not be caught by our (very simple) heuristics, feel free to share it with us.
I am still testing, and will let you know if I find anything.
I agree with Eliot that it can be hazardous to process arbitrary external contents (you are probably aware of that, too). Good firewall/proxy settings may be able to tackle some of the issues that will not be handled during XML parsing.
Unfortunately, I have little influence on the firewall/proxy in the production environment, so I try to handle everything in BaseX or my docker image.
Kind regards, Nico
On Fri, 2025-03-14 at 16:41 +0100, Nico Verwer (Rakensi) wrote:
The latest release says: "Entities: expansion limit exceeded or recursive definitions found." No more billion laughs!
Note that this attack affects every language with the ability to make new objects by joining strings, including JavaScript (which imposes a similar limit).
For example, in XQuery,
let $s1 := ":-) :-) :-)", $s2 := $s1 || $s1 || $s1 || $s1 || $s1 || $s1, $s3 := $s2 || $s2 || $s2 || $s2 || $s2 || $s2 return $s3 || $s3
(probably you have to go a bit furtherbut you see the idea).
A public-facing page that accepts XPath, XQuery or XSLT, should have limits on memory usage, e.g. with setrlimit on Linux or Unix systems (e.g. using the bash ulimit command).
basex-talk@mailman.uni-konstanz.de