Thanks for the insight!

 

I can see the benefit with your example – if you look at my example, it is clearly eating the text (“DUMMY”) which might be an edge case, but is obviously a problem when you think the function will give you an error in case of non-wellformedness – some text has silently been deleted.

 

Daniel

 

Von: Christian Grün <christian.gruen@gmail.com>
Gesendet: Dienstag, 21. November 2023 16:59
An: Zimmel, Daniel <D.Zimmel@ESVmedien.de>
Cc: basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] Bug in parse-xml-fragment() and ampersand entity?

 

Hi Daniel,

 

Yes, I assume we’ll need to call it a bug… Although what BaseX is currently doing is known to us to be out of spec behavior. The function fn:parse-xml-fragments is based on our internal XML parser, which is much faster than the standard XML parser (in particular for small input), and it tolerates input that’s not perfectly well-formed. In addition, it accepts HTML entities without a linked DTD:

 

   parse-xml-fragment(`&auml;`)

 

We should at least document the behavior or (better) introduce a custom BaseX function for it.

 

Hope this helps (for now),

Christian

 

 

 

On Tue, Nov 21, 2023 at 3:17 PM Zimmel, Daniel <D.Zimmel@esvmedien.de> wrote:

Hi,

is this a bug?

Query:
        parse-xml-fragment('Tom &amp; Jerry')

Result:
        Tom ? Jerry

Same result with:
        parse-xml-fragment('Tom &amp;DUMMY; Jerry')

BaseX 10.7

Saxon complains correctly that the resulting document node is not well-formed.
BaseX should also return an error, shouldn't it?

Best, Daniel