Hi, Christian,Thanks very much for looking into this. If I use the OxGarage TEI web service through the front-end client to upload a file (http://www.tei-c.org/oxgarage/ ), here is how it sends the request payload on the back end. Non-ASCII characters are replaced with octal escape sequences.
Encapsulated multipart part: (text/xml)
Content-Disposition: form-data; name="fileToConvert"; filename="tei.xml"\r\n
Content-Type: text/xml\r\n\r\n
eXtensible Markup Language
<TEI xmlns="http://www.tei-c.org/ns/1.0 " xml:lang="en">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Multipart test</title>
<author/>
</titleStmt>
<publicationStmt>
<p>unknown</p>
</publicationStmt>
<sourceDesc>
<p>unknown</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<div type="level1">
<div type="level2">
<p n="4">
<hi rendition="simple:bold"/>
</p>
<p n="5" rend="Normal">
<hi rend="bold underline"> Regression Equation </hi>
</p>
<p n="6" rend="Normal">
<math xmlns="http://www.w3.org/1998/ Math/MathML ">
<mover accent="true">
<mrow>
<mi> Y </mi>
</mrow>
<mo> ^ </mo>
</mover>
<mo> = </mo>
<msub>
<mrow>
<mi> \316\262 </mi>
</mrow>
<mrow>
<mn> 1 </mn>
</mrow>
</msub>
<mo> + </mo>
<msub>
<mrow>
<mi> \316\262 </mi>
</mrow>
<mrow>
<mn> 2 </mn>
</mrow>
</msub>
<msub>
<mrow>
<mi> X </mi>
</mrow>
<mrow>
<mn> 2 </mn>
</mrow>
</msub>
<mo> + </mo>
<mo> \342\200\246 </mo>
<mo> + </mo>
<msub>
<mrow>
<mi> \316\262 </mi>
</mrow>
<mrow>
<mi> i </mi>
</mrow>
</msub>
<msub>
<mrow>
<mi> X </mi>
</mrow>
<mrow>
<mi> i </mi>
</mrow>
</msub>
</math>
</p>
</div>
</div>
</body>
</text>
</TEI>
Boundary: \r\n----------------------------- 10775069631632435281298450283\ r\n --
Tim A. Thompson
Metadata Librarian (Spanish/Portuguese Specialty)
Princeton University Library
www.linkedin.com/in/timathompson
tat2@princeton.eduOn Sat, Mar 11, 2017 at 10:30 AM, Christian Grün <christian.gruen@gmail.com> wrote:Hi Tim,
Finally some feedback on this issue.
It turned out that I cannot provide an easy fix for the problem you
encountered. Your observations have already summarized the problem,
and you have also found out what is happening internally: Whenever a
multi-part body contains non-ASCII data, the
"Content-Transfer-Encoding:base64" header is added [1].
I am now mostly wondering how non-ASCII characters should be
transferred, if not encoded as base64. Do you have some idea how the
request would need to look like for TEI-C to be parseable?
Cheers,
Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/ main/java/org/basex/util/http/ HttpClient.java#L271
> Content-Type: text/xml\r\n
> Content-Transfer-Encoding: base64\r\n\r\n
> eXtensible Markup Language
> [truncated]
> PGh0bWwgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzE5OTkveGh0bWwiPjxo ZWFkPjxtZXRhLz48\r\ndGl0bGU+VG VzdDwvdGl0bGU+PC9oZWFkPjxib2R5 PjxtYXRoIHhtbG5zPSJodHRwOi8vd3 d3Lncz\r\nLm9yZy8xOTk4L01hdGgv TWF0aE1MIj48bXN1Yj48bWk+zrI8L2 1pPjxtbj5Ud288L21
>
> Attached here is a basic test case to replicate the problem: an HTML page
> with a form and the RESTXQ function that it calls.
>
> I've tried setting a new header to specify Content-Transfer-Encoding as
> "binary" instead of "base64," but it doesn't replace the default header. Is
> there any way that the encoding could be controlled from RESTXQ?
>
> Thanks in advance!
>
> Tim
>
> --
> Tim A. Thompson
> Metadata Librarian (Spanish/Portuguese Specialty)
> Princeton University Library
>
> www.linkedin.com/in/timathompson
> tat2@princeton.edu