Hi everyone,
when parsing a multipart response we are getting from a server we need to integrate with basex raises the exception [1].
By forcing the mediatype (with override-media-type attribute) to text/plain we've been able to see that parts look something like [2].
Our suspect was that the multipart parser does not tolerate the quotes around UTF-8 but exploring the jungles of RFC they seem to be allowed as per an example of https://tools.ietf.org/html/rfc2045 that states:
Note that the value of a quoted string parameter does not include the quotes. That is, the quotation marks in a quoted-string are not a part of the value of the parameter, but are merely used to delimit that parameter value. In addition, comments are allowed in accordance with RFC 822 rules for structured header fields. Thus the following two forms
Content-type: text/plain; charset=us-ascii (Plain text)
Content-type: text/plain; charset="us-ascii"
are completely equivalent.
Could we kindly have some feedback on this in order to avoid having to parse the whole multipart on our own?
Thank you very much.
Marco.
[1]
Unsupported encoding: java.nio.charset.IllegalCharsetNameException: "UTF-8" at org.basex.core.Command.execute(Command.java:94) at org.basex.gui.GUI.exec(GUI.java:427) at org.basex.gui.GUI.lambda$4(GUI.java:370) at org.basex.gui.GUI$$Lambda$80/502071500.run(Unknown Source) at java.lang.Thread.run(Thread.java:745) Caused by: org.basex.query.QueryException: Unsupported encoding: java.nio.charset.IllegalCharsetNameException: "UTF-8" at org.basex.query.QueryError.get(QueryError.java:1392) [...]
[2] ----boundary44723329.705882352941176-- Content-Type: application/xop+xml; type="application/soap+xml"; charset="UTF-8" Content-Transfer-Encoding: 8bit Content-Id: <0.1808ACE4.38B8.11E8.8E46.50569966F200>
Hi Marco,
I didn’t forget your mail ;)
Thanks for the elaborate analysis, which clearly helped to me to fix the issue more quickly [1]: Quoted strings will now be detected, and backslashed characters within the value will be unescaped. I didn’t add support for comments so far (they seem to be pretty rare these days), but I might tackle this if someone else encounters problems.
A new snapshot is available [2]. BaseX 9.0.1 will be due next week.
Cheers, Christian
[1] https://github.com/BaseXdb/basex/commit/eae493e0ba2e6d5989f2fe8734ede6df380a... [2] http://files.basex.org/releases/latest/
On Thu, Apr 5, 2018 at 1:20 PM, Marco Lettere m.lettere@gmail.com wrote:
Hi everyone,
when parsing a multipart response we are getting from a server we need to integrate with basex raises the exception [1].
By forcing the mediatype (with override-media-type attribute) to text/plain we've been able to see that parts look something like [2].
Our suspect was that the multipart parser does not tolerate the quotes around UTF-8 but exploring the jungles of RFC they seem to be allowed as per an example of https://tools.ietf.org/html/rfc2045 that states:
Note that the value of a quoted string parameter does not include the quotes. That is, the quotation marks in a quoted-string are not a part of the value of the parameter, but are merely used to delimit that parameter value. In addition, comments are allowed in accordance with RFC 822 rules for structured header fields. Thus the following two forms
Content-type: text/plain; charset=us-ascii (Plain text)
Content-type: text/plain; charset="us-ascii"
are completely equivalent.
Could we kindly have some feedback on this in order to avoid having to parse the whole multipart on our own?
Thank you very much.
Marco.
[1]
Unsupported encoding: java.nio.charset.IllegalCharsetNameException: "UTF-8" at org.basex.core.Command.execute(Command.java:94) at org.basex.gui.GUI.exec(GUI.java:427) at org.basex.gui.GUI.lambda$4(GUI.java:370) at org.basex.gui.GUI$$Lambda$80/502071500.run(Unknown Source) at java.lang.Thread.run(Thread.java:745) Caused by: org.basex.query.QueryException: Unsupported encoding: java.nio.charset.IllegalCharsetNameException: "UTF-8" at org.basex.query.QueryError.get(QueryError.java:1392) [...]
[2] ----boundary44723329.705882352941176-- Content-Type: application/xop+xml; type="application/soap+xml"; charset="UTF-8" Content-Transfer-Encoding: 8bit Content-Id: <0.1808ACE4.38B8.11E8.8E46.50569966F200>
Great, this allows us to remove a lot of proprietary and risky XQuery code to circumvent the issue. And at the moment I also think that supporting more byzantine scenarios is less important. Thanks a lot. Marco.
On 16/04/2018 10:37, Christian Grün wrote:
Hi Marco,
I didn’t forget your mail ;)
Thanks for the elaborate analysis, which clearly helped to me to fix the issue more quickly [1]: Quoted strings will now be detected, and backslashed characters within the value will be unescaped. I didn’t add support for comments so far (they seem to be pretty rare these days), but I might tackle this if someone else encounters problems.
A new snapshot is available [2]. BaseX 9.0.1 will be due next week.
Cheers, Christian
[1] https://github.com/BaseXdb/basex/commit/eae493e0ba2e6d5989f2fe8734ede6df380a... [2] http://files.basex.org/releases/latest/
On Thu, Apr 5, 2018 at 1:20 PM, Marco Lettere m.lettere@gmail.com wrote:
Hi everyone,
when parsing a multipart response we are getting from a server we need to integrate with basex raises the exception [1].
By forcing the mediatype (with override-media-type attribute) to text/plain we've been able to see that parts look something like [2].
Our suspect was that the multipart parser does not tolerate the quotes around UTF-8 but exploring the jungles of RFC they seem to be allowed as per an example of https://tools.ietf.org/html/rfc2045 that states:
Note that the value of a quoted string parameter does not include the quotes. That is, the quotation marks in a quoted-string are not a part of the value of the parameter, but are merely used to delimit that parameter value. In addition, comments are allowed in accordance with RFC 822 rules for structured header fields. Thus the following two forms
Content-type: text/plain; charset=us-ascii (Plain text)
Content-type: text/plain; charset="us-ascii"
are completely equivalent.
Could we kindly have some feedback on this in order to avoid having to parse the whole multipart on our own?
Thank you very much.
Marco.
[1]
Unsupported encoding: java.nio.charset.IllegalCharsetNameException: "UTF-8" at org.basex.core.Command.execute(Command.java:94) at org.basex.gui.GUI.exec(GUI.java:427) at org.basex.gui.GUI.lambda$4(GUI.java:370) at org.basex.gui.GUI$$Lambda$80/502071500.run(Unknown Source) at java.lang.Thread.run(Thread.java:745) Caused by: org.basex.query.QueryException: Unsupported encoding: java.nio.charset.IllegalCharsetNameException: "UTF-8" at org.basex.query.QueryError.get(QueryError.java:1392) [...]
[2] ----boundary44723329.705882352941176-- Content-Type: application/xop+xml; type="application/soap+xml"; charset="UTF-8" Content-Transfer-Encoding: 8bit Content-Id: <0.1808ACE4.38B8.11E8.8E46.50569966F200>
basex-talk@mailman.uni-konstanz.de