On Sun, Jan 25, 2015 at 1:22 PM, Christian Grün <christian.gruen@gmail.com> wrote:

Hi Johan,

A new snapshot is available [1]. In the course of rewriting the
hashing code, I further improved our streamlining architecture [2, 3].

Your testing feedback is welcome,
Christian

[1] http://files.basex.org/releases/latest/
[2] https://github.com/BaseXdb/basex/commit/b39b7
[3] https://github.com/BaseXdb/basex/commit/28139

On Sat, Jan 24, 2015 at 8:39 PM, Christian Grün
<christian.gruen@gmail.com> wrote:
> Thanks, this makes it much easier. I'll probably go for this one:
>
> MessageDigest md = MessageDigest.getInstance(algo);
> try(InputStream is = ...) {
> try(DigestInputStream dis = new DigestInputStream(is, md)) {
> while(dis.read() != -1);
> }
> return md.digest();
> }
>
> Keeping you updated,
> Christian
>
>
> On Sat, Jan 24, 2015 at 7:39 PM, Johan Mörén <johan.moren@gmail.com> wrote:
>> Hi Christian
>>
>> I think you can go with Javas implementation all the way. like this
>>
>> MessageDigest md = MessageDigest.getInstance("MD5");
>> InputStream is = new FileInputStream("C:\\Temp\\Small\\Movie.mp4"); // Size
>> 700 MB
>>
>> byte [] buffer = new byte [blockSize];
>> int numRead;
>> do
>> {
>> numRead = is.read(buffer);
>> if (numRead > 0)
>> {
>> md.update(buffer, 0, numRead);
>> }
>> } while (numRead != -1);
>>
>> byte[] digest = md.digest();
>>
>>
>> On Sat Jan 24 2015 at 6:49:18 PM Christian Grün <christian.gruen@gmail.com>
>> wrote:
>>>
>>> Hi Johan,
>>>
>>> looks like a useful feature! Currently, we use Java's default
>>> implementation for computing hashes [1]. If you want to help us, you
>>> could look out for an existing Java md5 hashing source code, which we
>>> could then adopt in BaseX!
>>>
>>> Best,
>>> Christian
>>>
>>> [1]
>>> https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/query/func/hash/HashFn.java
>>>
>>>
>>> On Sat, Jan 24, 2015 at 11:37 AM, Johan Mörén <johan.moren@gmail.com>
>>> wrote:
>>> > Hello!
>>> >
>>> > We have been using the hashing module to calculate md5 checksums on
>>> > binary
>>> > files successfully for a while. But last week we received our first
>>> > really
>>> > large file (4.3 gb) and our script threw a
>>> >
>>> > java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>>> >
>>> > We are currently using the 7.8 version of BaseX. I suspect that BaseX
>>> > materialize the stream returned by file:read-binary as a byte-array when
>>> > we
>>> > call the hash:md5 function.
>>> >
>>> > This is a snippet of our script where the problem arises
>>> > ...
>>> > let $binary := file:read-binary($filePath)
>>> > let $checksum := lower-case(xs:string(xs:hexBinary(hash:md5($binary))))
>>> > ...
>>> >
>>> > I think a nice feature to add to BaseX could either be a new function in
>>> > the
>>> > file-module called file-checksum($algorithm) that calculates checksum on
>>> > files in a streaming fashion. Or perhaps an option to the hashing
>>> > functions
>>> > that indicates that you want them to use streaming.
>>> >
>>> > Regards,
>>> > Johan Mörén