Re: [basex-talk] Bug in parse-xml-fragment() and ampersand entity?

21 Nov 2023


      Yes, I can see the problem: &DUMMY; ist interpreted as unknown entity and
thus replaced with a question mark (a better choice would be the Unicode
Replacement Character xFFFD anyway, from today's perspective). We'll keep
that in mind and think about alternatives.
If your input is supposed to be interpreted as a single text fragment, one
fallback solution (for now) would be
data(parse-xml('<x>' || $string || '</x>'))
Zimmel, Daniel D.Zimmel@esvmedien.de schrieb am Di., 21. Nov. 2023, 18:34:
...
Thanks for the insight!
I can see the benefit with your example – if you look at my example, it is
clearly eating the text (“DUMMY”) which might be an edge case, but is
obviously a problem when you think the function will give you an error in
case of non-wellformedness – some text has silently been deleted.
Daniel
*Von:* Christian Grün christian.gruen@gmail.com
*Gesendet:* Dienstag, 21. November 2023 16:59
*An:* Zimmel, Daniel D.Zimmel@ESVmedien.de
*Cc:* basex-talk@mailman.uni-konstanz.de
*Betreff:* Re: [basex-talk] Bug in parse-xml-fragment() and ampersand
entity?
Hi Daniel,
Yes, I assume we’ll need to call it a bug… Although what BaseX is
currently doing is known to us to be out of spec behavior. The function
fn:parse-xml-fragments is based on our internal XML parser, which is much
faster than the standard XML parser (in particular for small input), and it
tolerates input that’s not perfectly well-formed. In addition, it accepts
HTML entities without a linked DTD:
parse-xml-fragment(`&auml;`)
We should at least document the behavior or (better) introduce a custom
BaseX function for it.
Hope this helps (for now),
Christian
On Tue, Nov 21, 2023 at 3:17 PM Zimmel, Daniel D.Zimmel@esvmedien.de
wrote:
Hi,
is this a bug?
Query:
        parse-xml-fragment('Tom &amp; Jerry')
Result:
        Tom ? Jerry
Same result with:
        parse-xml-fragment('Tom &amp;DUMMY; Jerry')
BaseX 10.7
Saxon complains correctly that the resulting document node is not
well-formed.
BaseX should also return an error, shouldn't it?
Best, Daniel

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Bug in parse-xml-fragment() and ampersand entity?