Hi Ruby,
thank you for your feedback. I guess this is by design, as Java internally treats Strings as Unicode. Perhaps Christian has some thoughts on this.
You may overcome this easily by reading in a file (w.r.t. the encoding) or converting your (XML-)String to Unicode beforehand. You might as well check out the latest BaseX release (6.0) or the sources via http://www.inf.uni-konstanz.de/dbis/basex/community as I saw you were using 5.7.
Kind regards Michael
-- (-: ------------------ P.S. Ruby sorry for the double reply, I forgot to CC the list.
Am 31.03.2010 um 06:38 schrieb l0979365428@gmail.com:
Dear baseX creater:
If it is xml file ,it 's ok. If it is xml String, BaseX only receives utf-8 encoding of XML. The reasion is : //---------------------------------------------------// public static IO get(final String s) { if(s == null) return new IOFile(""); if(s.startsWith("<")) return newIOContent(Token.token(s)); if(s.startsWith("http://")) return new IOUrl(s); return new IOFile(s); //---------------------------------------------------------// public static byte[] token(final String s) { final int l = s.length(); if(l == 0) return EMPTY; final byte[] bytes = new byte[l]; for(int i = 0; i < l; i++) { final char c = s.charAt(i); if(c > 0x7F) return utf8(s); bytes[i] = (byte) c; } return bytes; //-----------------------------------------------// Doesn't parse with the head of XML file. But this programe create xml struts with Sax which pares the xml with the head info. For this reason,it'll make mistakes once the XML String is not UTF-8.
I fix it : if(s.startsWith("<")) return new IOContent(Token.encoding(s)); //----------------------------------------------// public static byte[] encoding(final String s) { final int l = s.length(); if(l == 0) return EMPTY; int i=0; int j =0; StringBuilder ss=new StringBuilder(); char hope[]=new char[]{'c','=','"','"'}; while (true) { final char c = s.charAt(i); if(hope[j]==c) j++; if(j==4) break; if(j>2&&c!=hope[j-1]) { ss.append(c); } i++; }
final byte[] bytes = new byte[l]; for(int k = 0; k < l; k++) { final char c = s.charAt(k); if(c > 0x7F) return others(s,ss.toString()); bytes[k] = (byte) c; } return bytes; } //-----------------------------------------------//
I wonder if you could fix this bug in the next version. Also BaseX doesn't support the String of xml with namespace build DB. And I have no idea. PS: BaseX57,Jdk1.5
best wishes Ruby
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk