Hello list.
Another newbie question. I am creating a file in UTF-16, Big-endian. I am putting a 2 byte BOM sequence of x'FEFF' as the 1st two bytes, followed by a prolog statement identifying this as UTF-16 code.
I then ftp this file to my desktop in binary to maintain the encoding.
Below is a cut-n-paste of the 1st few bytes as it looks on the mainframe box.
--------------------------- Ú.<?xml encoding="UTF-16" ?> FF46A9948989889877EEC6FF7466 EFCF743055364957EF436016F0FE
The way the display is within ISPF, each byte has one character on two lines, so the 1st byte, FE should be read as col 1 of both lines, and so forth.
When I try to create a new database I get the following from Basex.
Command: CREATE DB d100217 :zxpf.ftp.download/apf1.v2r3.xmldata Error: "d100217.xml" (Line 1): Content is not allowed in prolog.
Help greatly appreciated.
Regards,
Dave Day
Hi,
If the pasted text is what you see in a text editor, then the text is probably not valid UTF-16. It would look like this in the text display for hex viewer:
…<.?.x.m.l..e.n.c.o. etc. A space-like character before each displayed 8 bit character.
Aside from that the XML declaration requires version=”…”, so <?xml version=”1.0” encoding=”UTF-16”?>
And this is not valid markup
FF46A9948989889877EEC6FF7466 EFCF743055364957EF436016F0FE
You might want to look at https://www.w3.org/TR/xml/#NT-XMLDecl
And if you have software to encode your data as UTF-16BE that might be easier than trying to construct UTF-16 out of bytes, if that is what is shown.
Kendall
On 10/2/17, 2:24 PM, "basex-talk-bounces@mailman.uni-konstanz.de on behalf of Dave Day" <basex-talk-bounces@mailman.uni-konstanz.de on behalf of David.Day@duke-software.com> wrote:
Hello list.
Another newbie question. I am creating a file in UTF-16, Big-endian. I am putting a 2 byte BOM sequence of x'FEFF' as the 1st two bytes, followed by a prolog statement identifying this as UTF-16 code.
I then ftp this file to my desktop in binary to maintain the encoding.
Below is a cut-n-paste of the 1st few bytes as it looks on the mainframe box.
--------------------------- Ú.<?xml encoding="UTF-16" ?> FF46A9948989889877EEC6FF7466 EFCF743055364957EF436016F0FE
The way the display is within ISPF, each byte has one character on two lines, so the 1st byte, FE should be read as col 1 of both lines, and so forth.
When I try to create a new database I get the following from Basex.
Command: CREATE DB d100217 :zxpf.ftp.download/apf1.v2r3.xmldata Error: "d100217.xml" (Line 1): Content is not allowed in prolog.
Help greatly appreciated.
Regards,
Dave Day
Hi Kendall,
Thanks for taking the time to respond.
As usual, I did not do a good job of asking the question.
The cut-n-paste I put in the original was from a display that had 'hex on' option, so you get three lines displayed for each original line in the file. With it set to 'hex off', the display is Ú.<?xml encoding="UTF-16" ?>
With it set to 'hex on', the display is Ú.<?xml encoding="UTF-16" ?> FF46A9948989889877EEC6FF7466 EFCF743055364957EF436016F0FE
The 2nd and third lines in this cut-n-paste are the hex values for the 1st line.
I will put the version= in the code and try again, as well as looking at the link you sent.
Thank you.
-- Dave
On 10/2/2017 4:52 PM, Kendall Shaw wrote:
Hi,
If the pasted text is what you see in a text editor, then the text is probably not valid UTF-16. It would look like this in the text display for hex viewer:
…<.?.x.m.l..e.n.c.o. etc. A space-like character before each displayed 8 bit character.
Aside from that the XML declaration requires version=”…”, so <?xml version=”1.0” encoding=”UTF-16”?>
And this is not valid markup
FF46A9948989889877EEC6FF7466 EFCF743055364957EF436016F0FE
You might want to look at https://www.w3.org/TR/xml/#NT-XMLDecl
And if you have software to encode your data as UTF-16BE that might be easier than trying to construct UTF-16 out of bytes, if that is what is shown.
Kendall
On 10/2/17, 2:24 PM, "basex-talk-bounces@mailman.uni-konstanz.de on behalf of Dave Day" <basex-talk-bounces@mailman.uni-konstanz.de on behalf of David.Day@duke-software.com> wrote:
Hello list. Another newbie question. I am creating a file in UTF-16, Big-endian. I am putting a 2 byte BOM sequence of x'FEFF' as the 1st two bytes, followed by a prolog statement identifying this as UTF-16 code. I then ftp this file to my desktop in binary to maintain the encoding. Below is a cut-n-paste of the 1st few bytes as it looks on the mainframe box. --------------------------- Ú.<?xml encoding="UTF-16" ?> FF46A9948989889877EEC6FF7466 EFCF743055364957EF436016F0FE The way the display is within ISPF, each byte has one character on two lines, so the 1st byte, FE should be read as col 1 of both lines, and so forth. When I try to create a new database I get the following from Basex. Command: CREATE DB d100217 :zxpf.ftp.download/apf1.v2r3.xmldata Error: "d100217.xml" (Line 1): Content is not allowed in prolog. Help greatly appreciated. Regards, Dave Day
Hi Dave,
the UTF-16 BE code for '<' would be 00 3C. I cannot see these octets in your example, so maybe you’ll have to double-check your initial encoding step?
See [1] for some more examples.
Cheers, Christian
[1] https://en.m.wikipedia.org/wiki/UTF-16#Examples
Am 03.10.2017 12:02 vorm. schrieb "Dave Day" David.Day@duke-software.com:
Hi Kendall,
Thanks for taking the time to respond.
As usual, I did not do a good job of asking the question.
The cut-n-paste I put in the original was from a display that had 'hex on' option, so you get three lines displayed for each original line in the file. With it set to 'hex off', the display is
Ú.<?xml encoding="UTF-16" ?>
With it set to 'hex on', the display is
Ú.<?xml encoding="UTF-16" ?> FF46A9948989889877EEC6FF7466 EFCF743055364957EF436016F0FE
The 2nd and third lines in this cut-n-paste are the hex values for the 1st line.
I will put the version= in the code and try again, as well as looking at the link you sent.
Thank you.
-- Dave
On 10/2/2017 4:52 PM, Kendall Shaw wrote:
Hi,
If the pasted text is what you see in a text editor, then the text is probably not valid UTF-16. It would look like this in the text display for hex viewer:
…<.?.x.m.l..e.n.c.o. etc. A space-like character before each displayed 8 bit character.
Aside from that the XML declaration requires version=”…”, so <?xml version=”1.0” encoding=”UTF-16”?>
And this is not valid markup
FF46A9948989889877EEC6FF7466 EFCF743055364957EF436016F0FE
You might want to look at https://www.w3.org/TR/xml/#NT-XMLDecl
And if you have software to encode your data as UTF-16BE that might be easier than trying to construct UTF-16 out of bytes, if that is what is shown.
Kendall
On 10/2/17, 2:24 PM, "basex-talk-bounces@mailman.uni-konstanz.de on behalf of Dave Day" <basex-talk-bounces@mailman.uni-konstanz.de on behalf of David.Day@duke-software.com> wrote:
Hello list. Another newbie question. I am creating a file in UTF-16,
Big-endian. I am putting a 2 byte BOM sequence of x'FEFF' as the 1st two bytes, followed by a prolog statement identifying this as UTF-16 code. I then ftp this file to my desktop in binary to maintain the encoding. Below is a cut-n-paste of the 1st few bytes as it looks on the mainframe box. --------------------------- Ú.<?xml encoding="UTF-16" ?> FF46A9948989889877EEC6FF7466 EFCF743055364957EF436016F0FE The way the display is within ISPF, each byte has one character on two lines, so the 1st byte, FE should be read as col 1 of both lines, and so forth. When I try to create a new database I get the following from Basex. Command: CREATE DB d100217 :zxpf.ftp.download/apf1.v2r3.xmldata Error: "d100217.xml" (Line 1): Content is not allowed in prolog. Help greatly appreciated. Regards, Dave Day
basex-talk@mailman.uni-konstanz.de