Hi,
following Christian's insinuation we're now abandoning XQJ and are porting our application to use the ClientSession API. I'm performing a textual replacement of the external variables in my XQuery expressions before sending them to BaseX's server. I'm facing difficulties with embedding character data in the XQuery expressions that are sent to the server.
Notably:
xquery <c-code>if (cond) { // a C comment }</c-code>
produces:
Stopped at line 1, column 33: [XPST0003] Expecting "}", found "C".
whereas:
xquery <c-code><![CDATA[ if (cond) { // a C comment } ]]></c-code>
produces:
<c-code> if (cond) { // a C comment } </c-code>
Is this a bug in BaseX? I thought that the use of CDATA is optional and required only if one wishes to avoid escaping <, >, & in PCDATA. If CDATA is required, I would have to convert my XML document in this way. The problem did not occur in the XQJ binding.
I haven't consulted the grammar; I'm following my intuition here that XQuery should allow embedded XML fragments. Is this not so?
- Godmar
Hi Godmar,
without consulting the grammar: I guess this is due to evaluating the contents of curly braces in return statements. The // evaluates to descendant-or-self and the following 'a' to a node-test. The subsequent 'C comment' evaluates to neither another location-step nor a predicate, so the expression produces a syntax error.
Most probably the XQJ binding handled this escaping automatically, perhaps a team member with more expertise in XQJ could address this.
Hope I could clear things up a little. Feel free to ask more. Kind regards Michael
Am 19.08.2010 um 06:10 schrieb Godmar Back:
Is this a bug in BaseX? I thought that the use of CDATA is optional and required only if one wishes to avoid escaping <, >, & in PCDATA. If CDATA is required, I would have to convert my XML document in this way. The problem did not occur in the XQJ binding.
I haven't consulted the grammar; I'm following my intuition here that XQuery should allow embedded XML fragments. Is this not so?
On Thu, Aug 19, 2010 at 4:14 AM, Michael Seiferle < michael.seiferle@uni-konstanz.de> wrote:
Hi Godmar,
without consulting the grammar:
Well, I had hoped I could push looking it up to you.
But http://www.w3.org/TR/xquery/#id-content says:
The part of a direct element constructor between the start tag and the end tag is called the *content* of the element constructor. This content may consist of text characters (parsed as ElementContentCharhttp://www.w3.org/TR/xquery/#doc-xquery-ElementContentChar), nested direct constructors, CdataSectionshttp://www.w3.org/TR/xquery/#doc-xquery-CDataSection, character and predefined entity referenceshttp://www.w3.org/TR/xquery/#dt-predefined-entity-reference, and expressions enclosed in curly braces.
So, { } are legal inside direct element constructors and are interpreted. This means that, oddly enough, the value returned by BaseX actually can't be used in a direct element constructor.
In my opinion, BaseX shouldn't return XML in a form that can't be used as a direct element constructor. (Note that you don't escape the CDATA, even though it contains { }.
Most probably the XQJ binding handled this escaping automatically, perhaps a
team member with more expertise in XQJ could address this.
Alright, I suppose I have to consult your XQJ code because in order to avoid it I have to reimplement it.
A quick Google search reveals: http://www.w3.org/TR/xslt#output and the cdata-section-elements Transformer property: http://download-llnw.oracle.com/javase/1.4.2/docs/api/javax/xml/transform/Tr...
which appears rather awkward since it requires that one lists all those elements whose text children should be CDATA'd. There doesn't appear to be an option to say "all," or is there?
- Godmar
Hi Godmar,
I hope I got your problem right, though I am still not 100% sure. Perhaps I am missing something as I don't see your issue in full context, but I will try to answer below: Am 19.08.2010 um 14:13 schrieb Godmar Back:
On Thu, Aug 19, 2010 at 4:14 AM, Michael Seiferle michael.seiferle@uni-konstanz.de wrote: Hi Godmar,
without consulting the grammar:
Well, I had hoped I could push looking it up to you.
I was in a hurry ;-)
But http://www.w3.org/TR/xquery/#id-content says:
The part of a direct element constructor between the start tag and the end tag is called the content of the element constructor. This content may consist of text characters (parsed as ElementContentChar), nested direct constructors, CdataSections, character and predefined entity references, and expressions enclosed in curly braces.
So, { } are legal inside direct element constructors and are interpreted. This means that, oddly enough, the value returned by BaseX actually can't be used in a direct element constructor.
In my opinion this is expected behavior as the direct element constructor parses curly braces as defined by the grammar. The value can be used - if it is valid regarding element construction. (The one in your example is not).
You have to distinguish between Documents/Fragments that do not know about Element Constructors e.g.:
<?xml version="1.0" encoding="UTF-8" ?>
<foo>{/b/a/r}</foo>
and XQuery, e.g.:
for $x in <foo>{/b/a/r}</foo>
which evaluates the Element Constructor.
If you put the same content inside a document and query/return it no constructors will be evaluated.
In my opinion, BaseX shouldn't return XML in a form that can't be used as a direct element constructor. (Note that you don't escape the CDATA, even though it contains { }.
In my opinion a node, once it has been returned (and constructed to the user's needs) should not be treated like it contained element constructors. The CDATA is not escaped as the Node is valid at the time of construction; but yes it turns erroneous if it is used for element construction in subsequent queries.
Perhaps other team members have an opinion on this too. I guess automatic escaping could be an option, on the other hand one could argue to leave this to the application programmer as valid nodes will be bloated with CDATA just in case they will possibly be used in Element construction. The escaping of curly braces can an also be done via {{ and }} [1].
Most probably the XQJ binding handled this escaping automatically,
I guess it did.
Alright, I suppose I have to consult your XQJ code because in order to avoid it I have to reimplement it.
which appears rather awkward since it requires that one lists all those elements whose text children should be CDATA'd. There doesn't appear to be an option to say "all," or is there?
yes, seems awkward, perhaps s/o else on the list has an idea. :-/
- Godmar
[1] http://www.w3.org/TR/xquery/#id-element-constructor ~last paragraph.
Hi Godmar,
Am 19.08.2010 14:13, schrieb Godmar Back:
Most probably the XQJ binding handled this escaping automatically, perhaps a team member with more expertise in XQJ could address this.
Alright, I suppose I have to consult your XQJ code because in order to avoid it I have to reimplement it.
the XQJ binding traverses the DOM tree by itsself (which will be changed in the near future because of the ecxcessive complexity related to finding and declaring implicitly declared namespaces...). I'm currently working on a more stable approach. I therefor wouldn't recommend taking the old code as reference.
Wouldn't be the simplest solution to your problem to replace the characters '{' and '}' in the serialized XML string with their character entities ("{" and "}" respectively)? I can't think of situations when it would destroy the markup and the parsing process converts them back either way.
Hope I could help, cheers Leo
Am 20.08.2010 um 11:43 schrieb Leonard Wörteler:
Hi Godmar,
Am 19.08.2010 14:13, schrieb Godmar Back:
Most probably the XQJ binding handled this escaping automatically, perhaps a team member with more expertise in XQJ could address this.
Alright, I suppose I have to consult your XQJ code because in order to avoid it I have to reimplement it.
Wouldn't be the simplest solution to your problem to replace the characters '{' and '}' in the serialized XML string with their character entities ("{" and "}" respectively)? I can't think of situations when it would destroy the markup and the parsing process converts them back either way.
Small addendum to my last mail, Christian pointed me on setting serialization parameters on the fly:
declare option output:cdata-section-elements "c-code"; <c-code>abc</c-code>
returns
<c-code><![CDATA[abc]]></c-code>
this might help your case if you know in advance which tags' contents contain {}
Hope this helps.
Michael
2010/8/20 Leonard Wörteler leonard.woerteler@uni-konstanz.de
Wouldn't be the simplest solution to your problem to replace the characters '{' and '}' in the serialized XML string with their character entities ("{" and "}" respectively)? I can't think of situations when it would destroy the markup and the parsing process converts them back either way.
Alright, I'll try that. The cdata-section-element approach (either on the Java-side via setOutputProperty or in the XQuery via declare option) would be cleaner in my opinion, but the apparent absence of a "all elements" option would require that one lists all elements whose character data is escaped in this way.
- Godmar
2010/8/20 Leonard Wörteler leonard.woerteler@uni-konstanz.de
the XQJ binding traverses the DOM tree by itsself (which will be changed in the near future because of the ecxcessive complexity related to finding and declaring implicitly declared namespaces...). I'm currently working on a more stable approach. I therefor wouldn't recommend taking the old code as reference.
Yes - aside from the bug I recently alerted youhttp://top.cs.vt.edu/~gback/bx/b622/TestNS.javato I believe there are more bugs in the current code (I've seen xmlns="" being added by BaseX in some interior nodes.)
- Godmar
basex-talk@mailman.uni-konstanz.de