I’ve been using the new method archive:extract-to almost since it became available in BaseX, and it generally works very well. I’ve run into trouble with an out of memory error a few times trying to extract .zip files that are fairly large. For example, the .zip file I’m looking at now is 268 Mb. The out of memory error seems to actually be thrown on the file:read-binary method, which must be passed as input to archive:extract-to. Increasing the amount of memory available to BaseX by increasing –Xmx from 512 to 768 helps, but that doesn’t seem like a good solution.

 

For example, using the latest BaseX (BaseX831-20150929.165521.zip), a 268 Mb .zip file, a smaller 65 Mb .zip file, and the following code:

 

 

declare namespace _ = "test";

 

declare function _:unzip1($zip as xs:anyURI) as xs:anyURI {

  let $tempDir := file:create-temp-dir('basex', 'unzip')

  return (

    archive:extract-to($tempDir, file:read-binary($zip)),

    xs:anyURI($tempDir)

  )

};

 

declare function _:unzip2($zip as xs:anyURI) as xs:anyURI {

  let $tempDir := file:create-temp-dir('basex', 'unzip')

  return (

    let $entries := zip:entries($zip)

    for $e in $entries//zip:entry

    let $path := string-join(($e/parent::zip:dir/@name, $e/@name), '/')

    return (

      file:create-dir(file:parent($tempDir || $path)),

      file:write-binary($tempDir || $path, zip:binary-entry($zip, $path))

    ),

    xs:anyURI($tempDir)

  )

};

 

declare function _:unzip3($zip as xs:anyURI) as xs:anyURI {

  let $tempDir := file:create-temp-dir('basex', 'unzip')

  return (

    for $f in file:read-binary($zip)

    return archive:extract-to($tempDir, $f),

    xs:anyURI($tempDir)

  )

};

 

let $zip := xs:anyURI('C:\temp\bigfile.zip')

let $unzipped := _:unzip2($zip) (: change this to test 1 2 3 :)

return $unzipped

 

 

When given a small .zip file _:unzip1, _:unzip2, and _:unzip3 extract all files from the .zip. _:unzip1 and _:unzip3 seem to complete in less time than _:unzip2.

 

When given a large .zip file, _:unzip2 and _:unzip3 work but _:unzip1 produces an out of memory error.

 

However, when I use the _:unzip1, _:unzip2 or _:unzip3 code in a larger program, which does some processing on the unzipped file, I still get an out of memory error and can see that not all files were extracted.

 

Any help would be appreciated.

 

Thanks,

Vincent