Great news Christian. I'll try it out tomorrow at work!
/Johan
On Sun, Jan 25, 2015 at 1:22 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Johan,
A new snapshot is available [1]. In the course of rewriting the hashing code, I further improved our streamlining architecture [2, 3].
Your testing feedback is welcome, Christian
[1] http://files.basex.org/releases/latest/ [2] https://github.com/BaseXdb/basex/commit/b39b7 [3] https://github.com/BaseXdb/basex/commit/28139
On Sat, Jan 24, 2015 at 8:39 PM, Christian Grün christian.gruen@gmail.com wrote:
Thanks, this makes it much easier. I'll probably go for this one:
MessageDigest md = MessageDigest.getInstance(algo); try(InputStream is = ...) { try(DigestInputStream dis = new DigestInputStream(is, md)) { while(dis.read() != -1); } return md.digest(); }
Keeping you updated, Christian
On Sat, Jan 24, 2015 at 7:39 PM, Johan Mörén johan.moren@gmail.com
wrote:
Hi Christian
I think you can go with Javas implementation all the way. like this
MessageDigest md = MessageDigest.getInstance("MD5"); InputStream is = new FileInputStream("C:\Temp\Small\Movie.mp4"); //
Size
700 MB
byte [] buffer = new byte [blockSize]; int numRead; do { numRead = is.read(buffer); if (numRead > 0) { md.update(buffer, 0, numRead); } } while (numRead != -1);
byte[] digest = md.digest();
On Sat Jan 24 2015 at 6:49:18 PM Christian Grün <
christian.gruen@gmail.com>
wrote:
Hi Johan,
looks like a useful feature! Currently, we use Java's default implementation for computing hashes [1]. If you want to help us, you could look out for an existing Java md5 hashing source code, which we could then adopt in BaseX!
Best, Christian
[1]
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Sat, Jan 24, 2015 at 11:37 AM, Johan Mörén johan.moren@gmail.com wrote:
Hello!
We have been using the hashing module to calculate md5 checksums on binary files successfully for a while. But last week we received our first really large file (4.3 gb) and our script threw a
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
We are currently using the 7.8 version of BaseX. I suspect that BaseX materialize the stream returned by file:read-binary as a byte-array
when
we call the hash:md5 function.
This is a snippet of our script where the problem arises ... let $binary := file:read-binary($filePath) let $checksum :=
lower-case(xs:string(xs:hexBinary(hash:md5($binary))))
...
I think a nice feature to add to BaseX could either be a new
function in
the file-module called file-checksum($algorithm) that calculates
checksum on
files in a streaming fashion. Or perhaps an option to the hashing functions that indicates that you want them to use streaming.
Regards, Johan Mörén