Hi (cc to the list),
The filenames in your archive may be CP437-encoded, while many newer archives use Unicode. Unfortunately, the standard JDK ZIP library that we use is not smart enough to detect different filename encodings. We could add an $options argument to all Archive Functions that open and create/update archives [1]. For the moment, you will need to…
• create archives with Unicode filenames (e.g., with "zip --unicode"), • avoid characters in the 80-FF range from CP437 [2], or • invoke an external unzipper, e.g. via proc:system.
Sorry for that, Christian
[1] https://github.com/BaseXdb/basex/issues/2344 [2] https://en.wikipedia.org/wiki/Code_page_437
On Thu, Nov 14, 2024 at 12:59 PM Grythe, Thomas Berge < thogry@innlandetfylke.no> wrote:
Hi!
I ran the code you sent me. I then found out the cause of the error. The reason for the error message 'malput input' is that there was the letter 'å' in a file name of the zip file (see the attached image).
When I changed the title "2 klage på vedtak" to "2 klage paa vedtak", the program worked.
Is there an easy way to fix this so that the code can handle special characters like "æ, ø" and "å"?
Best regards, Thomas.
*Fra:* Christian Grün christian.gruen@gmail.com *Sendt:* onsdag 13. november 2024 16:11 *Til:* Grythe, Thomas Berge thogry@innlandetfylke.no *Emne:* Re: [basex-talk] Potential bug in archive:extract-to
Denne eposten er sendt fra en person utenfor organisasjonen. Ikke klikk på lenker eller åpne vedlegg før du er sikker på hvem avsender er og at innholdet er trygt.
Hi Thomas,
maybe we can first try to simplify the script. Could you check what the following code does?
let $inputpath :="E:\Transfer\vaaler_websak – Kopi" for $name in //record//entry/text() where contains($name, '.bin') let $input := $inputpath || $name let $target := $inputpath || substring-before($name, '.') return archive:extract-to($target, $input)
If yes, could you possibly send the problematic archive file to me (it needn’t be shared over the list)?
Thanks, Christian
On Wed, Nov 13, 2024 at 3:58 PM Grythe, Thomas Berge < thogry@innlandetfylke.no> wrote:
Hi!
Thank you for your reply. Attached is the code of my program and also an image of the extracted files from a zip-file. If I run the program with this zip -file, I am getting the error message 'malformed input'.
Can the file -names cause this problem?
Best regards, Thomas.
*Fra:* Christian Grün christian.gruen@gmail.com *Sendt:* onsdag 13. november 2024 14:32 *Til:* Grythe, Thomas Berge thogry@innlandetfylke.no *Kopi:* basex-talk@mailman.uni-konstanz.de < basex-talk@mailman.uni-konstanz.de> *Emne:* Re: [basex-talk] Potential bug in archive:extract-to
Denne eposten er sendt fra en person utenfor organisasjonen. Ikke klikk på lenker eller åpne vedlegg før du er sikker på hvem avsender er og at innholdet er trygt.
Hi Thomas,
As Martin indicated, it would be interesting what the xquery:eval function call does. Could you possibly provide us with a little self-contained example?
With BaseX 11 or later, you can simply do:
archive:extract-to('/path/to/target', '/path/to/archive')
Best, Christian
On Tue, Nov 12, 2024 at 10:28 AM Grythe, Thomas Berge < thogry@innlandetfylke.no> wrote:
Hi!
I am an electronic archivist and I have recently tried to use BaseX to unzip files in an archive. The two main lines I have used are:
let $archive := file:read-binary(xquery:eval($filepath-corrected) return archive:extract-to(xquery:eval($dir_corrected), $archive)
The variables $dir_corrected and $archive are defined earlier in the code. But I get the error message 'malformed input off : 10, length : 1' , indicating that there is an issue with the input data being processed.
Do you know what can cause this problem? And do you know of a possible work - around?
Med vennlig hilsen
*Thomas Berge Grythe* Rådgiver Innlandet Fylkesarkiv/IKA Opplandene
Telefon: 48 99 47 85 E-post: thogry@innlandetfylke.no
*Innlandet fylkeskommune* Telefon: 62 00 08 80 www.innlandetfylke.no
basex-talk@mailman.uni-konstanz.de