Hi, Christian,
Thanks very much for looking into this. If I use the OxGarage TEI web service through the front-end client to upload a file ( http://www.tei-c.org/oxgarage/), here is how it sends the request payload on the back end. Non-ASCII characters are replaced with octal escape sequences.
Encapsulated multipart part: (text/xml) Content-Disposition: form-data; name="fileToConvert"; filename="tei.xml"\r\n Content-Type: text/xml\r\n\r\n eXtensible Markup Language <TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="en"> <teiHeader> <fileDesc> <titleStmt> <title>Multipart test</title> <author/> </titleStmt> <publicationStmt> <p>unknown</p> </publicationStmt> <sourceDesc> <p>unknown</p> </sourceDesc> </fileDesc> </teiHeader> <text> <body> <div type="level1"> <div type="level2"> <p n="4"> <hi rendition="simple:bold"/> </p> <p n="5" rend="Normal"> <hi rend="bold underline"> Regression Equation </hi> </p> <p n="6" rend="Normal"> <math xmlns=" http://www.w3.org/1998/Math/MathML"> <mover accent="true"> <mrow> <mi> Y </mi> </mrow> <mo> ^ </mo> </mover> <mo> = </mo> <msub> <mrow> <mi> \316\262 </mi> </mrow> <mrow> <mn> 1 </mn> </mrow> </msub> <mo> + </mo> <msub> <mrow> <mi> \316\262 </mi> </mrow> <mrow> <mn> 2 </mn> </mrow> </msub> <msub> <mrow> <mi> X </mi> </mrow> <mrow> <mn> 2 </mn> </mrow> </msub> <mo> + </mo> <mo> \342\200\246 </mo> <mo> + </mo> <msub> <mrow> <mi> \316\262 </mi> </mrow> <mrow> <mi> i </mi> </mrow> </msub> <msub> <mrow> <mi> X </mi> </mrow> <mrow> <mi> i </mi> </mrow> </msub> </math> </p> </div> </div> </body> </text> </TEI> Boundary: \r\n-----------------------------10775069631632435281298450283\r\n
-- Tim A. Thompson Metadata Librarian (Spanish/Portuguese Specialty) Princeton University Library
www.linkedin.com/in/timathompson tat2@princeton.edu
On Sat, Mar 11, 2017 at 10:30 AM, Christian Grün christian.gruen@gmail.com wrote:
Hi Tim,
Finally some feedback on this issue.
It turned out that I cannot provide an easy fix for the problem you encountered. Your observations have already summarized the problem, and you have also found out what is happening internally: Whenever a multi-part body contains non-ASCII data, the "Content-Transfer-Encoding:base64" header is added [1].
I am now mostly wondering how non-ASCII characters should be transferred, if not encoded as base64. Do you have some idea how the request would need to look like for TEI-C to be parseable?
Cheers, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/ src/main/java/org/basex/util/http/HttpClient.java#L271
Content-Type: text/xml\r\n Content-Transfer-Encoding: base64\r\n\r\n eXtensible Markup Language [truncated] PGh0bWwgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkveGh0bWwiPjxo
ZWFkPjxtZXRhLz48\r\ndGl0bGU+VGVzdDwvdGl0bGU+PC9oZWFkPjxib2R5PjxtYXRoIHhtbG 5zPSJodHRwOi8vd3d3Lncz\r\nLm9yZy8xOTk4L01hdGgvTWF0aE1MIj48bXN1Yj48bWk+ zrI8L21pPjxtbj5Ud288L21
Attached here is a basic test case to replicate the problem: an HTML page with a form and the RESTXQ function that it calls.
I've tried setting a new header to specify Content-Transfer-Encoding as "binary" instead of "base64," but it doesn't replace the default header.
Is
there any way that the encoding could be controlled from RESTXQ?
Thanks in advance!
Tim
-- Tim A. Thompson Metadata Librarian (Spanish/Portuguese Specialty) Princeton University Library
www.linkedin.com/in/timathompson tat2@princeton.edu