Hello,
Congrats on the BaseX 11.9 release!
Installed it, and started to run my XQuery tests. Right away I got the following error: "Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized."
Both doc() and collection() are tripping on it. A full repro is below.
XQuery #1 ============================================= declare variable $base_dir as xs:string := 'e:\Temp';
for $docx in collection($base_dir) return $docx
XQuery #2 ============================================= declare variable $base_dir as xs:string := 'e:\Temp';
for $file in file:list($base_dir, false(), 'books*.xml') let $docx := doc($base_dir || $file) return $docx
Regards, Yitzhak Khabinsky
On 01-05-2025 16:35, ykhabins@bellsouth.net wrote:
Hello,
Congrats on the BaseX 11.9 release!
Installed it, and started to run my XQuery tests. Right away I got the following error: "Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized."
This property is important to prevent the "1 billion laughs attack", and I have been pulling my hair out over it today. I am trying to set this property, not in BaseX but in a different XML processing framework.
How, and if the entityExpansionLimit is set is highly dependent on the Java version, and the XML parser it uses. Could you let us know which Java version you are using?
Best, Nico
Hi Nico,
I tried BaseX 11.9 on two machines with the same error outcome. Here is their Java versions.
java.vm.compressedOopsMode: 32-bit java.vm.info: mixed mode, sharing java.vm.name: OpenJDK 64-Bit Server VM java.vm.specification.name: Java Virtual Machine Specification java.vm.specification.vendor: Oracle Corporation java.vm.specification.version: 22 java.vm.vendor: Eclipse Adoptium java.vm.version: 22.0.2+9
java.vm.compressedOopsMode: 32-bit java.vm.info: mixed mode, sharing java.vm.name: OpenJDK 64-Bit Server VM java.vm.specification.name: Java Virtual Machine Specification java.vm.specification.vendor: Oracle Corporation java.vm.specification.version: 17 java.vm.vendor: Eclipse Adoptium java.vm.version: 17.0.8.1+1
Regards, Yitzhak Khabinsky
Hello everyone on the list!
There is a change between 11.8 and 11.9, related to security settings. This has to do with the following document:
<?xmlversion="1.0"?> <!DOCTYPEfoo[ <!ELEMENT foo ANY > <!ENTITYxxe SYSTEM "file:///"> ]> <foo>&xxe;</foo>
When parsing this document, BaseX11.8 threw an error with code err:FODC0002, which means that the resource cannot be retrieved. BaseX 11.9 gives a listing of the root directory of my computer. This can be used to retrieve all files on my computer, which is a security risk.
The issue in my message from 2 May still exists in BaseX 12.
The issue had to do with the following document:
<!DOCTYPE foo [ <!ELEMENT foo ANY > <!ENTITY xxe SYSTEM "file:///" > ]> <foo>&xxe;</foo>
This returns a document with a listing of the root of my file system. From there, I can enter sub-directories and extract files. This is a well-known external entity injection [https://portswigger.net/web-security/xxe#exploiting-xxe-to-retrieve-files]. This is present in both the standard and the internal parser. Currently, I can prevent this by running BaseX as a user with few permissions, but it would be better to be able to prevent this kind of entity expansion.
For my application, I need to process files sent by external users. And before that, I need to pass the security checks that my client performs.
Is this considered by other BaseX users to be a vulnerability? Why was it not present in BaseX 11.8? How (if at all) can it be solved?
Hi Nico,
I’ve pasted my reply from May 3 below, in case it was missed. From my perspective, that should address the issue - please let me know if you see it differently.
Best regards, Gunther
Gesendet: Samstag, 3. Mai 2025 um 01:19 Von: "Gunther Rademacher" grd@gmx.net An: nverwer@rakensi.com, basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] Security problem in 11.9?
Hi Nico,
what you describe is the expected behaviour. Please be aware of the recent changes of fn:doc (and also fn:parse-xml) that were made in 11.9.
These functions now support options to control the access of external entities, in particular
- allow-external-entities: whether external entities are permitted (true) or rejected (false), default true - dtd: whether external entities are processed (true) or ignored (false), default true.
In fact option dtd is not completely new, but previously its value was taken from the context option DTD, which defaults to false. So with 11.8, you could produce the same directory listing, that you experienced with 11.9, by running this on the document that you provided:
basex -ODTD=yes "doc('doc.xml')"
Now the options can be supplied per function call, they are independent of the context options, and the defaults are different. To restore the result that you were used to with 11.9, you need to run:
basex "doc('doc.xml', { 'dtd': false() })"
You can now also run this in order to reject any external entity references:
basex "doc('doc.xml', { 'allow-external-entities': false() })"
The changes were made to implement the XQuery 4.0 specification of these functions:
https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-doc https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-parse-xml
This is also described here:
https://docs.basex.org/12/Standard_Functions#fn:doc https://docs.basex.org/12/Standard_Functions#fn:parse-xml
Best regards, Gunther
Gesendet: Donnerstag, 3. Juli 2025 um 13:39 Von: nverwer@rakensi.com An: basex-talk@mailman.uni-konstanz.de Betreff: [basex-talk] Re: Security problem in 11.9?
The issue in my message from 2 May still exists in BaseX 12.
The issue had to do with the following document:
<!DOCTYPE foo [ <!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "file:///" >
]> <foo>&xxe;</foo>
This returns a document with a listing of the root of my file system. From there, I can enter sub-directories and extract files. This is a well-known external entity injection [https://portswigger.net/web-security/xxe#exploiting-xxe-to-retrieve-files]. This is present in both the standard and the internal parser. Currently, I can prevent this by running BaseX as a user with few permissions, but it would be better to be able to prevent this kind of entity expansion.
For my application, I need to process files sent by external users. And before that, I need to pass the security checks that my client performs.
Is this considered by other BaseX users to be a vulnerability? Why was it not present in BaseX 11.8? How (if at all) can it be solved?
Thanks, Gunther, for the response.
@Nico: In invite you to join the ongoing discussion on sane/safe defaults for the fn:doc and fn:parse-xml functions in the qtspecs repository [1]. User feedback is always appreciated.
Best, Christian
[1] https://github.com/qt4cg/qtspecs/issues/2034
________________________________ Von: Gunther Rademacher via BaseX-Talk basex-talk@mailman.uni-konstanz.de Gesendet: Donnerstag, Juli 3, 2025 7:28:14 PM An: nverwer@rakensi.com nverwer@rakensi.com; basex-talk@mailman.uni-konstanz.de basex-talk@mailman.uni-konstanz.de Betreff: [basex-talk] Re: Security problem in 11.9?
Hi Nico,
I’ve pasted my reply from May 3 below, in case it was missed. From my perspective, that should address the issue - please let me know if you see it differently.
Best regards, Gunther
Gesendet: Samstag, 3. Mai 2025 um 01:19 Von: "Gunther Rademacher" grd@gmx.net An: nverwer@rakensi.com, basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] Security problem in 11.9?
Hi Nico,
what you describe is the expected behaviour. Please be aware of the recent changes of fn:doc (and also fn:parse-xml) that were made in 11.9.
These functions now support options to control the access of external entities, in particular
- allow-external-entities: whether external entities are permitted (true) or rejected (false), default true - dtd: whether external entities are processed (true) or ignored (false), default true.
In fact option dtd is not completely new, but previously its value was taken from the context option DTD, which defaults to false. So with 11.8, you could produce the same directory listing, that you experienced with 11.9, by running this on the document that you provided:
basex -ODTD=yes "doc('doc.xml')"
Now the options can be supplied per function call, they are independent of the context options, and the defaults are different. To restore the result that you were used to with 11.9, you need to run:
basex "doc('doc.xml', { 'dtd': false() })"
You can now also run this in order to reject any external entity references:
basex "doc('doc.xml', { 'allow-external-entities': false() })"
The changes were made to implement the XQuery 4.0 specification of these functions:
https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-doc https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-parse-xml
This is also described here:
https://docs.basex.org/12/Standard_Functions#fn:doc https://docs.basex.org/12/Standard_Functions#fn:parse-xml
Best regards, Gunther
Thank you, Christian for your response.
It is great to have Gunnter and you, and others on this list responding to issues, even if they are not really issues.
@Nico: In invite you to join the ongoing discussion on sane/safe defaults for the fn:doc and fn:parse-xml functions in the qtspecs repository [1]. User feedback is always appreciated.
Best, Christian
I was not aware of the discussion on that page, and it is very useful.
Best regards, Nico
Hello Gunther,Thank you very much for reminding me of your response.
I’ve pasted my reply from May 3 below, in case it was missed. From my perspective, that should address the issue - please let me know if you see it differently.
At the time I was busy with many things, and forgot about it. I use the O'Reilly XQuery book when programming XQuery, which does not yet include the $options parameter. This shows me that I should use the BaseX documentation instead.
Using the right options, as you suggest, indeed fixes the 'problem'. Thanks again!
Best regards, Nico
The property to set the entity expansion limit has had several different names over the years (and Java versions): "jdk.xml.entityExpansionLimit", "http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit", and just "entityExpansionLimit".
I have been trying to set this property in an old piece of software, which needs to survive until I have finished porting it to BaseX. However, all variations report that the property is unknown.
I have serious doubts about whether it is possible to set this property at all, in spite of Oracle's documentation in [https://docs.oracle.com/en/java/javase/21/security/java-api-xml-processing-j...]. I will explain why:
In my version of Java, the SAXParserFactory implementation is `org.apache.xerces.jaxp.SAXParserFactoryImpl`. This seems to be different, but not too different, from `com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl`, which Gunther Rademacher had in his message. The entityExpansionLimit property is set in an instance of [https://github.com/apache/xerces-j/blob/main/src/org/apache/xerces/util/Secu...]. The security manager is set in [https://github.com/apache/xerces-j/blob/main/src/org/apache/xerces/jaxp/SAXP...], if secure processing is turned on, by creating a `new SecurityManager()`. The `SecurityManager` uses the `DEFAULT_ENTITY_EXPANSION_LIMIT = 100000`, and never looks at the entityExpansionLimit property or its variations. I have been trying to get to the security manager via `SAXParserImpl.getProperty`, but that does not seem possible. If I could do that, I could change the entityExpansionLimit directly on the `SecurityManager`.
This is how far I got, and I think I am stuck here.
Hi Gunther,
"...Presumably there is some other SAX parser factory on your classpath, e.g. Xerces. Can you confirm this?..."
That's correct!!!
I am using BaseX for the following tasks: - XSD 1.1 validations via Xerces-J 2.12.2 - XSLT 3.0 transformations via Saxon-HE 12.5
To do that, and following official BaseX way, their assemblies are copied to the following directory: c:\Program Files (x86)\BaseX\lib\custom
Here is a list of Xerces jar files: c:\Program Files (x86)\BaseX\lib\custom\icu4j-69_1.jar c:\Program Files (x86)\BaseX\lib\custom\cupv10k-runtime.jar c:\Program Files (x86)\BaseX\lib\custom\org.eclipse.wst.xml.xpath2.processor_1.2.1.jar c:\Program Files (x86)\BaseX\lib\custom\xercesImpl.jar c:\Program Files (x86)\BaseX\lib\custom\xml-apis.jar
Here is a list of Saxon jar files: c:\Program Files (x86)\BaseX\lib\custom\lib\jline-2.14.6.jar c:\Program Files (x86)\BaseX\lib\custom\lib\xmlresolver-5.2.2.jar c:\Program Files (x86)\BaseX\lib\custom\lib\xmlresolver-5.2.2-data.jar c:\Program Files (x86)\BaseX\lib\custom\saxon-he-12.5.jar c:\Program Files (x86)\BaseX\lib\custom\saxon-he-test-12.5.jar c:\Program Files (x86)\BaseX\lib\custom\saxon-he-xqj-12.5.jar
In such case, will it work the suggestion to tweak the SAXParserFactory setting? And where exactly should I put that line?
javax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl
Regards, Yitzhak Khabinsky
Hi Yitzhak,
We have made a new snapshot available, based on JDK17, that may allow you to parse your XML files correctly [1].
It would still be interested to know if Gunther’s suggestion helps as well.
Thanks in advance, all the best, Christian
[1] https://files.basex.org/releases/latest/
On Fri, May 2, 2025 at 2:14 PM ykhabins@bellsouth.net wrote:
Hi Gunther,
"...Presumably there is some other SAX parser factory on your
classpath, e.g. Xerces. Can you confirm this?..."
That's correct!!!
I am using BaseX for the following tasks:
- XSD 1.1 validations via Xerces-J 2.12.2
- XSLT 3.0 transformations via Saxon-HE 12.5
To do that, and following official BaseX way, their assemblies are copied to the following directory: c:\Program Files (x86)\BaseX\lib\custom
Here is a list of Xerces jar files: c:\Program Files (x86)\BaseX\lib\custom\icu4j-69_1.jar c:\Program Files (x86)\BaseX\lib\custom\cupv10k-runtime.jar c:\Program Files (x86)\BaseX\lib\custom\org.eclipse.wst.xml.xpath2.processor_1.2.1.jar c:\Program Files (x86)\BaseX\lib\custom\xercesImpl.jar c:\Program Files (x86)\BaseX\lib\custom\xml-apis.jar
Here is a list of Saxon jar files: c:\Program Files (x86)\BaseX\lib\custom\lib\jline-2.14.6.jar c:\Program Files (x86)\BaseX\lib\custom\lib\xmlresolver-5.2.2.jar c:\Program Files (x86)\BaseX\lib\custom\lib\xmlresolver-5.2.2-data.jar c:\Program Files (x86)\BaseX\lib\custom\saxon-he-12.5.jar c:\Program Files (x86)\BaseX\lib\custom\saxon-he-test-12.5.jar c:\Program Files (x86)\BaseX\lib\custom\saxon-he-xqj-12.5.jar
In such case, will it work the suggestion to tweak the SAXParserFactory setting? And where exactly should I put that line?
javax.xml.parsers.SAXParserFactory=com.sun.org .apache.xerces.internal.jaxp.SAXParserFactoryImpl
Regards, Yitzhak Khabinsky
Mr. Grun,
First, as suggested by Gunther, I tried the BASEX_JVM environment variable on Windows OS. The dreaded error "Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized" was gone.
After that I installed the BaseX 12.0 beta 7831f9b build, and deleted the BASEX_JVM environment variable. All my XQuery scripts started to work as usual. The error "Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized" is gone.
Overall, it was a such breaking change, a big scare! IMHO, it is much better and cleaner to stay away from the environment variable.
Thanks a lot for such a quick turnaround.
Regards, Yitzhak Khabinsky
[…] is gone.
Good to hear.
Overall, it was a such breaking change, a big scare! IMHO, it is much
better and cleaner to stay away from the environment variable.
It would certainly be nice to be able to set this property differently, but the JDK does not offer a cleaner way to do so (suggestions are welcome, though). The temporary solution is to only assign the property if it the corresponding fn:doc option is actually supplied in the function call. In the long term, the fn:doc function signature may be subject to further changes until XQuery 4 is finalized.
It will still be interesting to understand which specific XML parser was used in your setup, and what would be the way to set the property for this parser accordingly.
Mr. Grun,
I specified in this thread earlier that I am using BaseX for the following tasks: - XSD 1.1 validations via Xerces-J 2.12.2 - XSLT 3.0 transformations via Saxon-HE 12.5
All their files that are in the custom directory are enlisted also.
Regards, Yitzhak Khabinsky
Thanks, yes – I saw that. Probably we need to try to reproduce your setting in order to find out whether we can trigger the error by ourselves.
ykhabins@bellsouth.net schrieb am Mo., 5. Mai 2025, 14:14:
Mr. Grun,
I specified in this thread earlier that I am using BaseX for the following tasks:
- XSD 1.1 validations via Xerces-J 2.12.2
- XSLT 3.0 transformations via Saxon-HE 12.5
All their files that are in the custom directory are enlisted also.
Regards, Yitzhak Khabinsky
Mr. Grun,
To facilitate testing on your side, here is their downloads.
https://xerces.apache.org/mirrors.cgi https://dlcdn.apache.org//xerces/j/binaries/Xerces-J-bin.2.12.2-xml-schema-1...
https://www.saxonica.com/download/java.xml https://github.com/Saxonica/Saxon-HE/releases/download/SaxonHE12-6/SaxonHE12...
Regards, Yitzhak Khabinsky
Dear Yitzhak,
Thanks for your links. – With the latest version, the default SAX parser will now be chosen to parse XML, no matter which other parsers are found in the classpath [1].
Best, Christian
[1] https://files.basex.org/releases/latest/
On Mon, May 5, 2025 at 2:51 PM ykhabins@bellsouth.net wrote:
Mr. Grun,
To facilitate testing on your side, here is their downloads.
https://xerces.apache.org/mirrors.cgi
https://dlcdn.apache.org//xerces/j/binaries/Xerces-J-bin.2.12.2-xml-schema-1...
https://www.saxonica.com/download/java.xml
https://github.com/Saxonica/Saxon-HE/releases/download/SaxonHE12-6/SaxonHE12...
Regards, Yitzhak Khabinsky
Mr. Grun,
I downloaded and tested the latest BaseX 12.0 beta 2686758. Tested it, everything is working as expected.
I completely concur with your decision on "the default SAX parser will now be chosen to parse XML, no matter which other parsers are found in the classpath". It is a prudent choice.
Regards, Yitzhak Khabinsky
basex-talk@mailman.uni-konstanz.de