Hi Andreas,

I think what you are observing is the following:

UTF-8 encoded stings can optionally denote a multi-byte sequence, with the number of leading 1-s defining the multibyte pattern length.

c.f. https://en.wikipedia.org/wiki/UTF-8

In your example, c3 decodes to:

xs:hexBinary("c3")
=> convert:binary-to-integers()
=> for-each(convert:integer-to-base(?,2))
(: returns: 11000011 :)

And the two leading 1s will tell the UTF-8 decoder to read a second byte — which is missing — hence decoding fails with an error or if you use the fallback-option it will return a �

While decoding ASCII, where only 127 bits are used, this is no problem as UTF-8 shares the same character positions with the ascii table.

Your „C3“ character however is not in ascii but most probably ISO-8859-1 or CP1252? So while a glance at https://tools.ietf.org/html/rfc3986 says URI Characters should be encoded in UTF-8 in practice

chances are you encounter values that are encoded using some „local“ encoding.

If your string is not UTF-8 encoded you may only guess what the correct encoding is.

You may send a predefined string that is known to be of two bytes length in UTF-8, such as: ä that will be either converted to „%C3%A4“ if it is unicode or to a well known single byte such as for example „E4“ in ISO-8859-1.

Depending and what you receive by your client for that given string you may assume it encodes its data either utf-8 or latin1.

You can check what your string would be encoded to:

string(convert:string-to-hex('ä',"latin1"))

Sorry for the long mail, hope the explanation is useful for you, even though the solution is not sooo simple and involves guessing :-)

Best

Michael

Am 09.06.2019 um 17:09 schrieb Andreas Mixich <mixich.andreas@gmail.com>:

How can I simply get back any character, readable by a human, from a hexadecimal value?