Thanks for sharing your solution!
Vincent
From: Tim Thompson [mailto:timathom@gmail.com] Sent: Monday, August 17, 2015 5:00 PM To: Lizzi, Vincent Vincent.Lizzi@taylorandfrancis.com Subject: Re: [basex-talk] Converting to UTF-8 in SQL module
Thanks, Christian, Vincent. Following Christian's suggestion, I used the RAWTOHEX() function in my SQL query, then cast it to an xs:hexBinary and applied BaseX's convert:binary-to-string(). Seems to work perfectly. --Tim
-- Tim A. Thompson Metadata Librarian (Spanish/Portuguese Specialty) Princeton University Library On Mon, Aug 17, 2015 at 4:48 PM, Lizzi, Vincent <Vincent.Lizzi@taylorandfrancis.commailto:Vincent.Lizzi@taylorandfrancis.com> wrote: Hi Tim,
Oracle should able to convert its output to Unicode before returning query results to the client (BaseX). Are you using Oracle's JDBC driver? It might be helpful to look into Oracle's NLS_LANG setting or the 'convert' function.
Vincent
-----Original Message----- From: basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.demailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Christian Grün Sent: Monday, August 17, 2015 3:23 PM To: Tim Thompson <timathom@gmail.commailto:timathom@gmail.com> Cc: BaseX <basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de> Subject: Re: [basex-talk] Converting to UTF-8 in SQL module
Hi Tim,
I am using the BaseX SQL module to query the Oracle database of a library catalog. Its character data is encoded as ISO 2709 (MARC-8)[1] and is stored in Oracle as US7ASCII.
The data contains diacritics with combining characters like this: http://www.fileformat.info/info/unicode/char/0301/index.htm
I don't know enough about the encodings of Oracle, but "US7ASCII" sounds to me as if only 7 bits are used for each character. Do you know how non-ASCII characters, such as combining characters, are stored in that format?
Maybe convert:string-to-hex and convert:binary-to-string [1] could be used to convert the result to the correct encoding.
Basically, all we do in BaseX is using standard JDBC functionality [2]. If there is an easy way to fix the issue with JDBC, it should be easy to also get it working in XQuery.
Christian
[1] http://docs.basex.org/wiki/Conversion_Module [2] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...