That mangled string is the result of reading UTF-8 byte sequences as single-byte characters, e.g. ASCII or some Windows code page.

 

How are you loading it into BaseX? It seems unlikely that BaseX-provided code would make this kind of basic mistake in reading text but it’s possible it applied the incorrect encoding for some reason.

 

Cheers,

 

Eliot

 

--

Eliot Kimber

http://contrext.com

 

 

 

From: <basex-talk-bounces@mailman.uni-konstanz.de> on behalf of BitRider001 <bit.rider.001@pm.me>
Reply-To: BitRider001 <bit.rider.001@pm.me>
Date: Thursday, May 17, 2018 at 8:34 PM
To: Bridger Dyson-Smith <bdysonsmith@gmail.com>
Cc: "basex-talk@mailman.uni-konstanz.de" <basex-talk@mailman.uni-konstanz.de>
Subject: Re: [basex-talk] about special characters

 

Bridger,

 

Indeed the file was exported from Excel in UTF-8 encoding. I've tried opening the CSV file using Notepad/Wordpad and in Linux with vi in a terminal and in both situations it displays the correct special character.

 

Its only when I load it into a BaseX db and query it does it show itself, as you said, as "mangled". Saving the results into a text file also contains the "mangled" string.

 

Strange.

 

Bit

 

 

 

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On May 18, 2018 9:21 AM, Bridger Dyson-Smith <bdysonsmith@gmail.com> wrote:

 

Bit -

that's odd; it looks like the characters are being decomposed (or whatever the term is) and mangled but I'm not sure, unfortunately. Was the CSV an export from Excel? If so, I suppose this could be a Windows character set problem (cp-1252 or iso-8859-1 or something?).

 

Bridger

 

On Thu, May 17, 2018 at 9:11 PM BitRider001 <bit.rider.001@pm.me> wrote:

Hi Bridger,

 

Yes that is right. I'm on the latest (9.0.1). Attaching a screenshot here for anyone to take a look.

 

 

Bit

 

 

 

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On May 18, 2018 8:41 AM, Bridger Dyson-Smith <bdysonsmith@gmail.com> wrote:

 

Hi Bit - are you using the latest version? There was a problem with 9.0 and some Unicode characters. Christian and co. have a fix in v9.0.1.

 

HTH,

Bridger

 

On Thu, May 17, 2018, 7:54 PM BitRider001 <bit.rider.001@pm.me> wrote:

Hi,

 

I just joined the mailing list due to a problem I'm having displaying and storing special characters.

 

I started with a CSV and created a database from it and the CSV is in UTF-8. However, when I query the special characters become garbled. I'm using the GUI in Windows 10.

 

It starts with this in the CSV:

<name>Cañelas</name>

 

Then ends up with this when I export the query result into a text file:

<name>Ca�las</name>

 

 

Help please.

 

Bit