Hi,
when serializing a string, that contains literal XML with entities, how do I pass through those entities unchanged? Example:
let $input := "<p>Lorem ipsum ' dolor sit amet </p>" return serialize($input)
results in:
<p>Lorem ipsum dolor sit amet, ' consectetur adipisicing elit.</p>
but I want:
<p>Lorem ipsum dolor sit amet, ' consectetur adipisicing elit.</p>
Hi Andreas -
Have you tried using different serialization options? I.e., serialize.xq: ``` declare option output:method "xml"; declare option output:parameter-document "map.xml"; declare variable $input := "<p>Lorem ipsum, ' dolor sit amet.</p>"; serialize($input) ```
map.xml: ``` <serialization-parameters xmlns=" http://www.w3.org/2010/xslt-xquery-serialization"> <use-character-maps> <character-map character="'" map-string="&apos;"/> </use-character-maps> </serialization-parameters> ```
When run in the BaseX GUI, I get: `&lt;p&gt;Lorem ipsum, ' dolor sit amet.&lt;/p&gt;`, might be closer?
I think you might have been experiencing the default 'basex' serialization option (see [1] for more). Hope that helps. Best, Bridger
[1] http://docs.basex.org/wiki/Serialization
On Mon, Sep 9, 2019 at 9:05 AM Andreas Mixich mixich.andreas@gmail.com wrote:
Hi,
when serializing a string, that contains literal XML with entities, how do I pass through those entities unchanged? Example:
let $input := "<p>Lorem ipsum ' dolor sit amet </p>" return serialize($input)
results in:
<p>Lorem ipsum dolor sit amet, ' consectetur adipisicing elit.</p>
but I want:
<p>Lorem ipsum dolor sit amet, ' consectetur adipisicing elit.</p>
-- Minden jót, all the best, Alles Gute, Andreas Mixich
On Mon, 2019-09-09 at 15:04 +0200, Andreas Mixich wrote:
when serializing a string, that contains literal XML with entities, how do I pass through those entities unchanged?
One way is to use a character map, as Bridger Dyson-Smith described.
Sometimes another way can be to have a version of the DTD in which the replacement text of the entity marks the presence of the entity, e.g. <!ENTITY eacute "&eacute;"> but this will affect full-text searching of course.
Liam
I wonder why the serialization behaves that way. It does not make sense to me. If a user has the need to escape XML, it should be thorough, shouldn't it?
On Mon, Sep 9, 2019 at 10:47 PM Liam R. E. Quin liam@fromoldbooks.org wrote:
On Mon, 2019-09-09 at 15:04 +0200, Andreas Mixich wrote:
when serializing a string, that contains literal XML with entities, how do I pass through those entities unchanged?
One way is to use a character map, as Bridger Dyson-Smith described.
Sometimes another way can be to have a version of the DTD in which the replacement text of the entity marks the presence of the entity, e.g.
<!ENTITY eacute "&eacute;">
but this will affect full-text searching of course.
Liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Webslave for old illustrations http://www.fromoldbooks.org/
Hi Andreas - I'm not sure (way outside of my wheelhouse :), but I think because arbitrary serialization can generate invalid XML, so having a character map makes the possible invalidity explicit? Now that I've typed that, I'm not sure if that captures the rational or not. :) In any case, here's what the specifications have to say[1].
Best, Bridger
[1] https://www.w3.org/TR/xslt-xquery-serialization-31/#character-maps
On Mon, Sep 9, 2019 at 9:00 PM Andreas Mixich mixich.andreas@gmail.com wrote:
I wonder why the serialization behaves that way. It does not make sense to me. If a user has the need to escape XML, it should be thorough, shouldn't it?
On Mon, Sep 9, 2019 at 10:47 PM Liam R. E. Quin liam@fromoldbooks.org wrote:
On Mon, 2019-09-09 at 15:04 +0200, Andreas Mixich wrote:
when serializing a string, that contains literal XML with entities, how do I pass through those entities unchanged?
One way is to use a character map, as Bridger Dyson-Smith described.
Sometimes another way can be to have a version of the DTD in which the replacement text of the entity marks the presence of the entity, e.g.
<!ENTITY eacute "&eacute;">
but this will affect full-text searching of course.
Liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Webslave for old illustrations http://www.fromoldbooks.org/
-- Minden jót, all the best, Alles Gute, Andreas Mixich
On Tue, 2019-09-10 at 02:59 +0200, Andreas Mixich wrote:
I wonder why the serialization behaves that way. It does not make sense to me. If a user has the need to escape XML, it should be thorough, shouldn't it?
XML entities are expanded by he XML parser, so by the time XQuery (or XSLT) sees the document they are gone.
Consider an entity like <!ENTITY boy "<person><socks>black</socks><eyes>grey</eyes><name>Steven</name></pers on>">
<students>&boy</students>
It'd be really complex to have that visible to XPath and to have to write, e.g. ..../students/entity(*)/person
If it's an external parsed entity it's visible in that the base-uri property changes, but that's all.
Character entities like &rcedilla; (ŗ) are just special cases of general entities, and XML does not distinguish them. I wish it did, but we never got back to that work after publishing XML 1.0.
Liam
On Tue, Sep 10, 2019 at 3:37 AM Liam R. E. Quin liam@fromoldbooks.org wrote:
XML entities are expanded by he XML parser, so by the time XQuery (or XSLT) sees the document they are gone.
Ah, yes, I totally forgot about that! Thanks for clarification!
Ha ha, awesome Liam! Thank you for clarifying!
Best, Bridger
On Mon, Sep 9, 2019 at 9:37 PM Liam R. E. Quin liam@fromoldbooks.org wrote:
On Tue, 2019-09-10 at 02:59 +0200, Andreas Mixich wrote:
I wonder why the serialization behaves that way. It does not make sense to me. If a user has the need to escape XML, it should be thorough, shouldn't it?
XML entities are expanded by he XML parser, so by the time XQuery (or XSLT) sees the document they are gone.
Consider an entity like
<!ENTITY boy "<person><socks>black</socks><eyes>grey</eyes><name>Steven</name></pers
on>">
<students>&boy</students>
It'd be really complex to have that visible to XPath and to have to write, e.g. ..../students/entity(*)/person
If it's an external parsed entity it's visible in that the base-uri property changes, but that's all.
Character entities like &rcedilla; (ŗ) are just special cases of general entities, and XML does not distinguish them. I wish it did, but we never got back to that work after publishing XML 1.0.
Liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Web slave for vintage clipart http://www.fromoldbooks.org/
basex-talk@mailman.uni-konstanz.de