Hi all, When using parse-xml(-fragment), wrapping CDATA sections are removed and the text content within the original CDATA is returned.
For example: parse-xml-fragment("<newNode><data><![CDATA[{cdata}]]></data></newNode>")
returns
<newNode> <data>{cdata}</data> </newNode>
But I need this:
<newNode> <data><![CDATA[{cdata}]]></data> </newNode>
Do I miss something?
Thanks!
Hi Erdal,
the serialization parameter "cdata-section-elements" will help you:
http://docs.basex.org/wiki/Serialization
Best, Christian ___________________________
2013/10/24 Erdal Karaca erdal.karaca.de@gmail.com:
Hi all, When using parse-xml(-fragment), wrapping CDATA sections are removed and the text content within the original CDATA is returned.
For example: parse-xml-fragment("<newNode><data><![CDATA[{cdata}]]></data></newNode>")
returns
<newNode> <data>{cdata}</data> </newNode>
But I need this:
<newNode> <data><![CDATA[{cdata}]]></data> </newNode>
Do I miss something?
Thanks!
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Erdal,
It's hard to know what you're missing, exactly. On the other hand the behavior you report is the correct behavior given the XDM (the data model underlying BaseX).
The XDM presents an XML document as a tree of nodes, and does not represent whether any text content (text node leaves of this tree) have been represented, in a serialization (i.e., in XML as a text-based data format amenable to parsing), using one or more CDATA marked sections.
This means that unless you go to a lot of extra trouble, you won't ordinarily be able to "round trip" a CDATA marked section through a parsing and serialization cycle. (Plus, generally speaking, this requirement is usually an effort to provide a functionality for which there are better solutions in any case. Developers sometimes think they need a CDATA marked section when they actually don't.)
On the other hand, serializers can often be configured with a specification of where text nodes should be wrapped in CDATA marked sections. See http://docs.basex.org/wiki/Serialization for BaseX's support for this (using the cdata-section-elements parameter). This won't let you "keep" CDATA marked sections in your input, but it does let you get them into the output in places you designate.
Yet -- if this is related to your earlier thread -- I doubt that I understand exactly why escaping your { as {{ is not as good a solution as using a call to parse-xml() to wrap the XML syntax. Certainly I don't see any reason why it should not perform as well: won't the call to parse the string as XML take just as long as the bare XQuery syntax parse, which is happening in any case? (Of course I have no measurements.)
Cheers, Wendell
Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^
On Thu, Oct 24, 2013 at 3:06 AM, Erdal Karaca erdal.karaca.de@gmail.com wrote:
Hi all, When using parse-xml(-fragment), wrapping CDATA sections are removed and the text content within the original CDATA is returned.
For example: parse-xml-fragment("<newNode><data><![CDATA[{cdata}]]></data></newNode>")
returns
<newNode> <data>{cdata}</data> </newNode>
But I need this:
<newNode> <data><![CDATA[{cdata}]]></data> </newNode>
Do I miss something?
Thanks!
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Thanks both for the quick answer!
As for choosing parse-xml over replacing '{' with '{{':
replace(replace('<x><text>{abc}</text><text><![CDATA[{xyz}]]></text></x>', '{', '{{'),'}', '}}')
Just replacing the curly braces would change the semantics (of user provided contents) in the second text element as it is wrapped inside a CDATA section and wont be evaluated (as embbeded expression) anyways, would it not? You would have to know when to escape and when not to.
Thanks!
2013/10/24 Wendell Piez wapiez@wendellpiez.com
Erdal,
It's hard to know what you're missing, exactly. On the other hand the behavior you report is the correct behavior given the XDM (the data model underlying BaseX).
The XDM presents an XML document as a tree of nodes, and does not represent whether any text content (text node leaves of this tree) have been represented, in a serialization (i.e., in XML as a text-based data format amenable to parsing), using one or more CDATA marked sections.
This means that unless you go to a lot of extra trouble, you won't ordinarily be able to "round trip" a CDATA marked section through a parsing and serialization cycle. (Plus, generally speaking, this requirement is usually an effort to provide a functionality for which there are better solutions in any case. Developers sometimes think they need a CDATA marked section when they actually don't.)
On the other hand, serializers can often be configured with a specification of where text nodes should be wrapped in CDATA marked sections. See http://docs.basex.org/wiki/Serialization for BaseX's support for this (using the cdata-section-elements parameter). This won't let you "keep" CDATA marked sections in your input, but it does let you get them into the output in places you designate.
Yet -- if this is related to your earlier thread -- I doubt that I understand exactly why escaping your { as {{ is not as good a solution as using a call to parse-xml() to wrap the XML syntax. Certainly I don't see any reason why it should not perform as well: won't the call to parse the string as XML take just as long as the bare XQuery syntax parse, which is happening in any case? (Of course I have no measurements.)
Cheers, Wendell
Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^
On Thu, Oct 24, 2013 at 3:06 AM, Erdal Karaca erdal.karaca.de@gmail.com wrote:
Hi all, When using parse-xml(-fragment), wrapping CDATA sections are removed and
the
text content within the original CDATA is returned.
For example: parse-xml-fragment("<newNode><data><![CDATA[{cdata}]]></data></newNode>")
returns
<newNode> <data>{cdata}</data> </newNode>
But I need this:
<newNode> <data><![CDATA[{cdata}]]></data> </newNode>
Do I miss something?
Thanks!
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Erdal,
similar to Wendell, I’m not quite sure what’s your actual reason for using CDATA elements, the parse-xml() function etc. Talking about your example..
replace(replace('<x><text>{abc}</text><text><![CDATA[{xyz}]]></text></x>', '{', '{{'),'}', '}}')
..would you like to interpret "abc" as XQuery?
Christian
Sorry for being unclear: the xml fragment used here may have come from an external file which I do not know. I.e. the user that created that file may have used "{abc}" or "<![CDATA[{xyz}]]>", but it is all xml, not xquery.
Workflow: - Read external file which contains user provided contents (maybe, with curly braces in some text contents or embedded in CDATA sections) - Replace an existing node in DB with the read xml fragment using parse-xml-fragment()
So, the parse-xml-fragment() function works for now, but will strip off CDATA sections and maybe encode some chars when serializing the data again.
Thanks!
2013/10/25 Christian Grün christian.gruen@gmail.com
Hi Erdal,
similar to Wendell, I’m not quite sure what’s your actual reason for using CDATA elements, the parse-xml() function etc. Talking about your example..
replace(replace('<x><text>{abc}</text><text><![CDATA[{xyz}]]></text></x>',
'{', '{{'),'}', '}}')
..would you like to interpret "abc" as XQuery?
Christian
- Read external file which contains user provided contents (maybe, with
curly braces in some text contents or embedded in CDATA sections)
- Replace an existing node in DB with the read xml fragment using
parse-xml-fragment()
Why don’t you do something like..
let $xml := doc("external-file.xml") return replace node ... with $xml
Christian
basex-talk@mailman.uni-konstanz.de