Dear all,
after wrapping our heads around this for hours today, we don't know how to get rid of this inconsistency. Thus I ask for help ...
SSCE:
BaseX 9.6.4 [Standalone] Try 'help' to get more information.
xquery file:write-text("a1.txt", "°" || out:nl()) (: Same with
codepoints-to-string(176) instead of "°" :)
Query executed in 183.94 ms.
xquery file:read-text("a1.txt")
°
Query executed in 1.49 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt"))
Query executed in 3.4 ms.
xquery file:read-text("a2.txt")
[file:io-error] Decoding error: xb0
Testing the files with linux command-line tool "file", this is the output:
file a1.txt
a1.txt: Unicode text, UTF-8 text
file a2.txt
a2.txt: ISO-8859 text
Reading the file after "copying" it seems to change the encoding. How is this supposed to be handled?
Regards,
Marco.
Marco - I'm sorry but I can only corroborate your findings, and that trying to force UTF-8 by adding the encoding parameter to the functions doesn't seem to help; e.g.
) ./bin/basex BaseX 9.7.1 [Standalone] Try 'help' to get more information.
xquery file:current-dir()
/usr/home/bridger/bin/basex/ Query executed in 886.62 ms.
xquery file:write-text("a1.txt", "°" || out:nl(), "UTF-8")
Query executed in 4.32 ms.
xquery file:read-text("a1.txt")
°
Query executed in 1.99 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt", "UTF-8"),
"UTF-8")
Query executed in 1.83 ms.
xquery file:read-text("a2.txt")
[file:io-error] Decoding error: xb0
xquery file:read-text("a2.txt", "UTF-8")
[file:io-error] Decoding error: xb0
xquery file:read-text("a2.txt", "ISO-8859-1")
°
Query executed in 2.01 ms.
On Fri, May 27, 2022 at 1:00 PM Marco Lettere m.lettere@gmail.com wrote:
Dear all,
after wrapping our heads around this for hours today, we don't know how to get rid of this inconsistency. Thus I ask for help ...
SSCE:
BaseX 9.6.4 [Standalone] Try 'help' to get more information.
xquery file:write-text("a1.txt", "°" || out:nl()) (: Same with
codepoints-to-string(176) instead of "°" :)
Query executed in 183.94 ms.
xquery file:read-text("a1.txt")
°
Query executed in 1.49 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt"))
Query executed in 3.4 ms.
xquery file:read-text("a2.txt")
[file:io-error] Decoding error: xb0
Testing the files with linux command-line tool "file", this is the output:
file a1.txt
a1.txt: Unicode text, UTF-8 text
file a2.txt
a2.txt: ISO-8859 text
Reading the file after "copying" it seems to change the encoding. How is this supposed to be handled?
Regards,
Marco.
Definitely looks like a bug. I’m currently on the road, but I’ll get to the bottom of this once I’m back.
Bridger Dyson-Smith bdysonsmith@gmail.com schrieb am Fr., 27. Mai 2022, 19:27:
Marco - I'm sorry but I can only corroborate your findings, and that trying to force UTF-8 by adding the encoding parameter to the functions doesn't seem to help; e.g.
) ./bin/basex BaseX 9.7.1 [Standalone] Try 'help' to get more information.
xquery file:current-dir()
/usr/home/bridger/bin/basex/ Query executed in 886.62 ms.
xquery file:write-text("a1.txt", "°" || out:nl(), "UTF-8")
Query executed in 4.32 ms.
xquery file:read-text("a1.txt")
°
Query executed in 1.99 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt", "UTF-8"),
"UTF-8")
Query executed in 1.83 ms.
xquery file:read-text("a2.txt")
[file:io-error] Decoding error: xb0
xquery file:read-text("a2.txt", "UTF-8")
[file:io-error] Decoding error: xb0
xquery file:read-text("a2.txt", "ISO-8859-1")
°
Query executed in 2.01 ms.
On Fri, May 27, 2022 at 1:00 PM Marco Lettere m.lettere@gmail.com wrote:
Dear all,
after wrapping our heads around this for hours today, we don't know how to get rid of this inconsistency. Thus I ask for help ...
SSCE:
BaseX 9.6.4 [Standalone] Try 'help' to get more information.
xquery file:write-text("a1.txt", "°" || out:nl()) (: Same with
codepoints-to-string(176) instead of "°" :)
Query executed in 183.94 ms.
xquery file:read-text("a1.txt")
°
Query executed in 1.49 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt"))
Query executed in 3.4 ms.
xquery file:read-text("a2.txt")
[file:io-error] Decoding error: xb0
Testing the files with linux command-line tool "file", this is the output:
file a1.txt
a1.txt: Unicode text, UTF-8 text
file a2.txt
a2.txt: ISO-8859 text
Reading the file after "copying" it seems to change the encoding. How is this supposed to be handled?
Regards,
Marco.
Oh yes thanks. Forgot to mention this. Forcing utf8 doesn't help.
Il ven 27 mag 2022, 19:26 Bridger Dyson-Smith bdysonsmith@gmail.com ha scritto:
Marco - I'm sorry but I can only corroborate your findings, and that trying to force UTF-8 by adding the encoding parameter to the functions doesn't seem to help; e.g.
) ./bin/basex BaseX 9.7.1 [Standalone] Try 'help' to get more information.
xquery file:current-dir()
/usr/home/bridger/bin/basex/ Query executed in 886.62 ms.
xquery file:write-text("a1.txt", "°" || out:nl(), "UTF-8")
Query executed in 4.32 ms.
xquery file:read-text("a1.txt")
°
Query executed in 1.99 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt", "UTF-8"),
"UTF-8")
Query executed in 1.83 ms.
xquery file:read-text("a2.txt")
[file:io-error] Decoding error: xb0
xquery file:read-text("a2.txt", "UTF-8")
[file:io-error] Decoding error: xb0
xquery file:read-text("a2.txt", "ISO-8859-1")
°
Query executed in 2.01 ms.
On Fri, May 27, 2022 at 1:00 PM Marco Lettere m.lettere@gmail.com wrote:
Dear all,
after wrapping our heads around this for hours today, we don't know how to get rid of this inconsistency. Thus I ask for help ...
SSCE:
BaseX 9.6.4 [Standalone] Try 'help' to get more information.
xquery file:write-text("a1.txt", "°" || out:nl()) (: Same with
codepoints-to-string(176) instead of "°" :)
Query executed in 183.94 ms.
xquery file:read-text("a1.txt")
°
Query executed in 1.49 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt"))
Query executed in 3.4 ms.
xquery file:read-text("a2.txt")
[file:io-error] Decoding error: xb0
Testing the files with linux command-line tool "file", this is the output:
file a1.txt
a1.txt: Unicode text, UTF-8 text
file a2.txt
a2.txt: ISO-8859 text
Reading the file after "copying" it seems to change the encoding. How is this supposed to be handled?
Regards,
Marco.
Hi Marco,
If the content of a file is written to another file without intermediate steps, it is streamed and consumes constant memory. The implementation for streaming the data was deficient.
The bug has been fixed; a new snapshot is available [1,2].
Grazie e ciao, Christian
[1] https://github.com/BaseXdb/basex/issues/2117 [2] https://files.basex.org/releases/latest/
On Fri, May 27, 2022 at 11:40 PM Marco Lettere m.lettere@gmail.com wrote:
Oh yes thanks. Forgot to mention this. Forcing utf8 doesn't help.
Il ven 27 mag 2022, 19:26 Bridger Dyson-Smith bdysonsmith@gmail.com ha scritto:
Marco - I'm sorry but I can only corroborate your findings, and that trying to force UTF-8 by adding the encoding parameter to the functions doesn't seem to help; e.g.
) ./bin/basex BaseX 9.7.1 [Standalone] Try 'help' to get more information.
xquery file:current-dir()
/usr/home/bridger/bin/basex/ Query executed in 886.62 ms.
xquery file:write-text("a1.txt", "°" || out:nl(), "UTF-8")
Query executed in 4.32 ms.
xquery file:read-text("a1.txt")
°
Query executed in 1.99 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt", "UTF-8"), "UTF-8")
Query executed in 1.83 ms.
xquery file:read-text("a2.txt")
[file:io-error] Decoding error: xb0
xquery file:read-text("a2.txt", "UTF-8")
[file:io-error] Decoding error: xb0
xquery file:read-text("a2.txt", "ISO-8859-1")
°
Query executed in 2.01 ms.
On Fri, May 27, 2022 at 1:00 PM Marco Lettere m.lettere@gmail.com wrote:
Dear all,
after wrapping our heads around this for hours today, we don't know how to get rid of this inconsistency. Thus I ask for help ...
SSCE:
BaseX 9.6.4 [Standalone] Try 'help' to get more information.
xquery file:write-text("a1.txt", "°" || out:nl()) (: Same with
codepoints-to-string(176) instead of "°" :)
Query executed in 183.94 ms.
xquery file:read-text("a1.txt")
°
Query executed in 1.49 ms.
xquery file:write-text("a2.txt", file:read-text("a1.txt"))
Query executed in 3.4 ms.
xquery file:read-text("a2.txt")
[file:io-error] Decoding error: xb0
Testing the files with linux command-line tool "file", this is the output:
file a1.txt
a1.txt: Unicode text, UTF-8 text
file a2.txt
a2.txt: ISO-8859 text
Reading the file after "copying" it seems to change the encoding. How is this supposed to be handled?
Regards,
Marco.
basex-talk@mailman.uni-konstanz.de