Hi,
this might differ from sending xml files, but if you sent any other file (image, word document) there is usually no conversion at all - just sending plain bytes (the headers do not even mention any encoding).
From my understanding, it would be the users responsibilty to decide over
the transfer encoding (if you do not specify it, then there might be some fallback, but currently you are forced to base64 - no matter what the headers already are).
Br, Max
2017-03-11 18:17 GMT+01:00 Tim Thompson timathom@gmail.com:
Hi, Christian,
Thanks very much for looking into this. If I use the OxGarage TEI web service through the front-end client to upload a file ( http://www.tei-c.org/oxgarage/), here is how it sends the request payload on the back end. Non-ASCII characters are replaced with octal escape sequences.
Encapsulated multipart part: (text/xml) Content-Disposition: form-data; name="fileToConvert"; filename="tei.xml"\r\n Content-Type: text/xml\r\n\r\n eXtensible Markup Language <TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="en"> <teiHeader> <fileDesc> <titleStmt> <title>Multipart test</title> <author/> </titleStmt> <publicationStmt> <p>unknown</p> </publicationStmt> <sourceDesc> <p>unknown</p> </sourceDesc> </fileDesc> </teiHeader> <text> <body> <div type="level1"> <div type="level2"> <p n="4"> <hi rendition="simple:bold"/> </p> <p n="5" rend="Normal"> <hi rend="bold underline"> Regression Equation </hi> </p> <p n="6" rend="Normal"> <math xmlns="http://www.w3.org/1998/ Math/MathML"> <mover accent="true"> <mrow> <mi> Y </mi> </mrow> <mo> ^ </mo> </mover> <mo> = </mo> <msub> <mrow> <mi> \316\262 </mi> </mrow> <mrow> <mn> 1 </mn> </mrow> </msub> <mo> + </mo> <msub> <mrow> <mi> \316\262 </mi> </mrow> <mrow> <mn> 2 </mn> </mrow> </msub> <msub> <mrow> <mi> X </mi> </mrow> <mrow> <mn> 2 </mn> </mrow> </msub> <mo> + </mo> <mo> \342\200\246 </mo> <mo> + </mo> <msub> <mrow> <mi> \316\262 </mi> </mrow> <mrow> <mi> i </mi> </mrow> </msub> <msub> <mrow> <mi> X </mi> </mrow> <mrow> <mi> i </mi> </mrow> </msub> </math> </p> </div> </div> </body> </text> </TEI> Boundary: \r\n----------------------------- 10775069631632435281298450283\r\n
-- Tim A. Thompson Metadata Librarian (Spanish/Portuguese Specialty) Princeton University Library
www.linkedin.com/in/timathompson tat2@princeton.edu
On Sat, Mar 11, 2017 at 10:30 AM, Christian Grün < christian.gruen@gmail.com> wrote:
Hi Tim,
Finally some feedback on this issue.
It turned out that I cannot provide an easy fix for the problem you encountered. Your observations have already summarized the problem, and you have also found out what is happening internally: Whenever a multi-part body contains non-ASCII data, the "Content-Transfer-Encoding:base64" header is added [1].
I am now mostly wondering how non-ASCII characters should be transferred, if not encoded as base64. Do you have some idea how the request would need to look like for TEI-C to be parseable?
Cheers, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/ main/java/org/basex/util/http/HttpClient.java#L271
Content-Type: text/xml\r\n Content-Transfer-Encoding: base64\r\n\r\n eXtensible Markup Language [truncated] PGh0bWwgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkveGh0bWwiPjxo
ZWFkPjxtZXRhLz48\r\ndGl0bGU+VGVzdDwvdGl0bGU+PC9oZWFkPjxib2R5 PjxtYXRoIHhtbG5zPSJodHRwOi8vd3d3Lncz\r\nLm9yZy8xOTk4L01hdGgv TWF0aE1MIj48bXN1Yj48bWk+zrI8L21pPjxtbj5Ud288L21
Attached here is a basic test case to replicate the problem: an HTML
page
with a form and the RESTXQ function that it calls.
I've tried setting a new header to specify Content-Transfer-Encoding as "binary" instead of "base64," but it doesn't replace the default
header. Is
there any way that the encoding could be controlled from RESTXQ?
Thanks in advance!
Tim
-- Tim A. Thompson Metadata Librarian (Spanish/Portuguese Specialty) Princeton University Library
www.linkedin.com/in/timathompson tat2@princeton.edu