Re: [basex-talk] Feature request

26 Jan 2015


      Hello!
Just wanted to report back that it works really well. It is about 50%
slower than running the md5 command on the command line of my mac. A 4.15
gb file takes around 20 seconds in BaseX compared to 10 seconds using the
native command.
Not sure if this is a limitation in Java or if performance could be tweaked
further. But at the moment it feels unimportant for our case.
Thank you again for your swift reply and delivery!
Regards,
Johan Mörén
On Sun Jan 25 2015 at 1:56:21 PM Johan Mörén johan.moren@gmail.com wrote:
...
Great news Christian. I'll try it out tomorrow at work!
/Johan
On Sun, Jan 25, 2015 at 1:22 PM, Christian Grün <christian.gruen@gmail.com
...
wrote:
...
Hi Johan,
A new snapshot is available [1]. In the course of rewriting the
hashing code, I further improved our streamlining architecture [2, 3].
Your testing feedback is welcome,
Christian
[1] http://files.basex.org/releases/latest/
[2] https://github.com/BaseXdb/basex/commit/b39b7
[3] https://github.com/BaseXdb/basex/commit/28139
On Sat, Jan 24, 2015 at 8:39 PM, Christian Grün
christian.gruen@gmail.com wrote:
...
Thanks, this makes it much easier. I'll probably go for this one:
MessageDigest md = MessageDigest.getInstance(algo);
try(InputStream is = ...) {
  try(DigestInputStream dis = new DigestInputStream(is, md)) {
    while(dis.read() != -1);
  }
  return md.digest();
}
Keeping you updated,
Christian
On Sat, Jan 24, 2015 at 7:39 PM, Johan Mörén johan.moren@gmail.com
wrote:
...
...
Hi Christian
I think you can go with Javas implementation all the way. like this
MessageDigest md = MessageDigest.getInstance("MD5");
InputStream is = new FileInputStream("C:\Temp\Small\Movie.mp4"); //
Size
...
...
700 MB
byte [] buffer = new byte [blockSize];
int numRead;
do
{
 numRead = is.read(buffer);
 if (numRead > 0)
 {
  md.update(buffer, 0, numRead);
 }
} while (numRead != -1);
byte[] digest = md.digest();
On Sat Jan 24 2015 at 6:49:18 PM Christian Grün <
christian.gruen@gmail.com>
...
...
wrote:
...
Hi Johan,
looks like a useful feature! Currently, we use Java's default
implementation for computing hashes [1]. If you want to help us, you
could look out for an existing Java md5 hashing source code, which we
could then adopt in BaseX!
Best,
Christian
[1]
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
...
...
...
On Sat, Jan 24, 2015 at 11:37 AM, Johan Mörén johan.moren@gmail.com
wrote:
...
Hello!
We have been using the hashing module to calculate md5 checksums on
binary
files successfully for a while. But last week we received our first
really
large file (4.3 gb) and our script threw a
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
We are currently using the 7.8 version of BaseX. I suspect that
BaseX
...
...
...
...
materialize the stream returned by file:read-binary as a byte-array
when
...
...
...
...
we
call the hash:md5 function.
This is a snippet of our script where the problem arises
...
let $binary := file:read-binary($filePath)
let $checksum :=
lower-case(xs:string(xs:hexBinary(hash:md5($binary))))
...
...
...
...
...
I think a nice feature to add to BaseX could either be a new
function in
...
...
...
...
the
file-module called file-checksum($algorithm) that calculates
checksum on
...
...
...
...
files in a streaming fashion. Or perhaps an option to the hashing
functions
that indicates that you want them to use streaming.
Regards,
Johan Mörén

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Feature request