Hello,
I am using the BaseX SQL module to query the Oracle database of a library catalog. Its character data is encoded as ISO 2709 (MARC-8)[1] and is stored in Oracle as US7ASCII.
The data contains diacritics with combining characters like this: http://www.fileformat.info/info/unicode/char/0301/index.htm
When I run the query in BaseX, all characters with combining accents are output as something like ́
Has anyone had experience handling this kind of encoding issue in BaseX, or does anyone have any solutions/approaches to recommend as a way to convert this data to UTF-8?
Thanks in advance, Tim
[1] https://en.wikipedia.org/wiki/ISO_2709 [2] http://www.fileformat.info/info/unicode/char/00e8/index.htm
-- Tim A. Thompson Metadata Librarian (Spanish/Portuguese Specialty) Princeton University Library tat2@princeton.edu
Hi Tim,
I am using the BaseX SQL module to query the Oracle database of a library catalog. Its character data is encoded as ISO 2709 (MARC-8)[1] and is stored in Oracle as US7ASCII.
The data contains diacritics with combining characters like this: http://www.fileformat.info/info/unicode/char/0301/index.htm
I don't know enough about the encodings of Oracle, but "US7ASCII" sounds to me as if only 7 bits are used for each character. Do you know how non-ASCII characters, such as combining characters, are stored in that format?
Maybe convert:string-to-hex and convert:binary-to-string [1] could be used to convert the result to the correct encoding.
Basically, all we do in BaseX is using standard JDBC functionality [2]. If there is an easy way to fix the issue with JDBC, it should be easy to also get it working in XQuery.
Christian
[1] http://docs.basex.org/wiki/Conversion_Module [2] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
basex-talk@mailman.uni-konstanz.de