Hello!
We have been using the hashing module to calculate md5 checksums on binary files successfully for a while. But last week we received our first really large file (4.3 gb) and our script threw a
*java.lang.OutOfMemoryError: Requested array size exceeds VM limit*
We are currently using the 7.8 version of BaseX. I suspect that BaseX materialize the stream returned by file:read-binary as a byte-array when we call the hash:md5 function.
This is a snippet of our script where the problem arises ... let $binary := file:read-binary($filePath) let $checksum := lower-case(xs:string(xs:hexBinary(hash:md5($binary)))) ...
I think a nice feature to add to BaseX could either be a new function in the file-module called file-checksum($algorithm) that calculates checksum on files in a streaming fashion. Or perhaps an option to the hashing functions that indicates that you want them to use streaming.
Regards, Johan Mörén
Hi Johan,
looks like a useful feature! Currently, we use Java's default implementation for computing hashes [1]. If you want to help us, you could look out for an existing Java md5 hashing source code, which we could then adopt in BaseX!
Best, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Sat, Jan 24, 2015 at 11:37 AM, Johan Mörén johan.moren@gmail.com wrote:
Hello!
We have been using the hashing module to calculate md5 checksums on binary files successfully for a while. But last week we received our first really large file (4.3 gb) and our script threw a
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
We are currently using the 7.8 version of BaseX. I suspect that BaseX materialize the stream returned by file:read-binary as a byte-array when we call the hash:md5 function.
This is a snippet of our script where the problem arises ... let $binary := file:read-binary($filePath) let $checksum := lower-case(xs:string(xs:hexBinary(hash:md5($binary)))) ...
I think a nice feature to add to BaseX could either be a new function in the file-module called file-checksum($algorithm) that calculates checksum on files in a streaming fashion. Or perhaps an option to the hashing functions that indicates that you want them to use streaming.
Regards, Johan Mörén
Hi Christian
I think you can go with Javas implementation all the way. like this
MessageDigest md = MessageDigest.getInstance("MD5");InputStream is = new FileInputStream("C:\Temp\Small\Movie.mp4"); // Size 700 MB byte [] buffer = new byte [blockSize];int numRead;do { numRead = is.read(buffer); if (numRead > 0) { md.update(buffer, 0, numRead); }} while (numRead != -1); byte[] digest = md.digest();
On Sat Jan 24 2015 at 6:49:18 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Johan,
looks like a useful feature! Currently, we use Java's default implementation for computing hashes [1]. If you want to help us, you could look out for an existing Java md5 hashing source code, which we could then adopt in BaseX!
Best, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/ src/main/java/org/basex/query/func/hash/HashFn.java
On Sat, Jan 24, 2015 at 11:37 AM, Johan Mörén johan.moren@gmail.com wrote:
Hello!
We have been using the hashing module to calculate md5 checksums on
binary
files successfully for a while. But last week we received our first
really
large file (4.3 gb) and our script threw a
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
We are currently using the 7.8 version of BaseX. I suspect that BaseX materialize the stream returned by file:read-binary as a byte-array when
we
call the hash:md5 function.
This is a snippet of our script where the problem arises ... let $binary := file:read-binary($filePath) let $checksum := lower-case(xs:string(xs:hexBinary(hash:md5($binary)))) ...
I think a nice feature to add to BaseX could either be a new function in
the
file-module called file-checksum($algorithm) that calculates checksum on files in a streaming fashion. Or perhaps an option to the hashing
functions
that indicates that you want them to use streaming.
Regards, Johan Mörén
Thanks, this makes it much easier. I'll probably go for this one:
MessageDigest md = MessageDigest.getInstance(algo); try(InputStream is = ...) { try(DigestInputStream dis = new DigestInputStream(is, md)) { while(dis.read() != -1); } return md.digest(); }
Keeping you updated, Christian
On Sat, Jan 24, 2015 at 7:39 PM, Johan Mörén johan.moren@gmail.com wrote:
Hi Christian
I think you can go with Javas implementation all the way. like this
MessageDigest md = MessageDigest.getInstance("MD5"); InputStream is = new FileInputStream("C:\Temp\Small\Movie.mp4"); // Size 700 MB
byte [] buffer = new byte [blockSize]; int numRead; do { numRead = is.read(buffer); if (numRead > 0) { md.update(buffer, 0, numRead); } } while (numRead != -1);
byte[] digest = md.digest();
On Sat Jan 24 2015 at 6:49:18 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Johan,
looks like a useful feature! Currently, we use Java's default implementation for computing hashes [1]. If you want to help us, you could look out for an existing Java md5 hashing source code, which we could then adopt in BaseX!
Best, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Sat, Jan 24, 2015 at 11:37 AM, Johan Mörén johan.moren@gmail.com wrote:
Hello!
We have been using the hashing module to calculate md5 checksums on binary files successfully for a while. But last week we received our first really large file (4.3 gb) and our script threw a
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
We are currently using the 7.8 version of BaseX. I suspect that BaseX materialize the stream returned by file:read-binary as a byte-array when we call the hash:md5 function.
This is a snippet of our script where the problem arises ... let $binary := file:read-binary($filePath) let $checksum := lower-case(xs:string(xs:hexBinary(hash:md5($binary)))) ...
I think a nice feature to add to BaseX could either be a new function in the file-module called file-checksum($algorithm) that calculates checksum on files in a streaming fashion. Or perhaps an option to the hashing functions that indicates that you want them to use streaming.
Regards, Johan Mörén
Great to hear Christian! You guys respond really fast :)
/Johan
On Sat Jan 24 2015 at 8:40:04 PM Christian Grün christian.gruen@gmail.com wrote:
Thanks, this makes it much easier. I'll probably go for this one:
MessageDigest md = MessageDigest.getInstance(algo); try(InputStream is = ...) { try(DigestInputStream dis = new DigestInputStream(is, md)) { while(dis.read() != -1); } return md.digest(); }
Keeping you updated, Christian
On Sat, Jan 24, 2015 at 7:39 PM, Johan Mörén johan.moren@gmail.com wrote:
Hi Christian
I think you can go with Javas implementation all the way. like this
MessageDigest md = MessageDigest.getInstance("MD5"); InputStream is = new FileInputStream("C:\Temp\Small\Movie.mp4"); //
Size
700 MB
byte [] buffer = new byte [blockSize]; int numRead; do { numRead = is.read(buffer); if (numRead > 0) { md.update(buffer, 0, numRead); } } while (numRead != -1);
byte[] digest = md.digest();
On Sat Jan 24 2015 at 6:49:18 PM Christian Grün <
christian.gruen@gmail.com>
wrote:
Hi Johan,
looks like a useful feature! Currently, we use Java's default implementation for computing hashes [1]. If you want to help us, you could look out for an existing Java md5 hashing source code, which we could then adopt in BaseX!
Best, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/
src/main/java/org/basex/query/func/hash/HashFn.java
On Sat, Jan 24, 2015 at 11:37 AM, Johan Mörén johan.moren@gmail.com wrote:
Hello!
We have been using the hashing module to calculate md5 checksums on binary files successfully for a while. But last week we received our first really large file (4.3 gb) and our script threw a
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
We are currently using the 7.8 version of BaseX. I suspect that BaseX materialize the stream returned by file:read-binary as a byte-array
when
we call the hash:md5 function.
This is a snippet of our script where the problem arises ... let $binary := file:read-binary($filePath) let $checksum := lower-case(xs:string(xs:
hexBinary(hash:md5($binary))))
...
I think a nice feature to add to BaseX could either be a new function
in
the file-module called file-checksum($algorithm) that calculates checksum
on
files in a streaming fashion. Or perhaps an option to the hashing functions that indicates that you want them to use streaming.
Regards, Johan Mörén
Hi Johan,
A new snapshot is available [1]. In the course of rewriting the hashing code, I further improved our streamlining architecture [2, 3].
Your testing feedback is welcome, Christian
[1] http://files.basex.org/releases/latest/ [2] https://github.com/BaseXdb/basex/commit/b39b7 [3] https://github.com/BaseXdb/basex/commit/28139
On Sat, Jan 24, 2015 at 8:39 PM, Christian Grün christian.gruen@gmail.com wrote:
Thanks, this makes it much easier. I'll probably go for this one:
MessageDigest md = MessageDigest.getInstance(algo); try(InputStream is = ...) { try(DigestInputStream dis = new DigestInputStream(is, md)) { while(dis.read() != -1); } return md.digest(); }
Keeping you updated, Christian
On Sat, Jan 24, 2015 at 7:39 PM, Johan Mörén johan.moren@gmail.com wrote:
Hi Christian
I think you can go with Javas implementation all the way. like this
MessageDigest md = MessageDigest.getInstance("MD5"); InputStream is = new FileInputStream("C:\Temp\Small\Movie.mp4"); // Size 700 MB
byte [] buffer = new byte [blockSize]; int numRead; do { numRead = is.read(buffer); if (numRead > 0) { md.update(buffer, 0, numRead); } } while (numRead != -1);
byte[] digest = md.digest();
On Sat Jan 24 2015 at 6:49:18 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Johan,
looks like a useful feature! Currently, we use Java's default implementation for computing hashes [1]. If you want to help us, you could look out for an existing Java md5 hashing source code, which we could then adopt in BaseX!
Best, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Sat, Jan 24, 2015 at 11:37 AM, Johan Mörén johan.moren@gmail.com wrote:
Hello!
We have been using the hashing module to calculate md5 checksums on binary files successfully for a while. But last week we received our first really large file (4.3 gb) and our script threw a
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
We are currently using the 7.8 version of BaseX. I suspect that BaseX materialize the stream returned by file:read-binary as a byte-array when we call the hash:md5 function.
This is a snippet of our script where the problem arises ... let $binary := file:read-binary($filePath) let $checksum := lower-case(xs:string(xs:hexBinary(hash:md5($binary)))) ...
I think a nice feature to add to BaseX could either be a new function in the file-module called file-checksum($algorithm) that calculates checksum on files in a streaming fashion. Or perhaps an option to the hashing functions that indicates that you want them to use streaming.
Regards, Johan Mörén
Great news Christian. I'll try it out tomorrow at work!
/Johan
On Sun, Jan 25, 2015 at 1:22 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Johan,
A new snapshot is available [1]. In the course of rewriting the hashing code, I further improved our streamlining architecture [2, 3].
Your testing feedback is welcome, Christian
[1] http://files.basex.org/releases/latest/ [2] https://github.com/BaseXdb/basex/commit/b39b7 [3] https://github.com/BaseXdb/basex/commit/28139
On Sat, Jan 24, 2015 at 8:39 PM, Christian Grün christian.gruen@gmail.com wrote:
Thanks, this makes it much easier. I'll probably go for this one:
MessageDigest md = MessageDigest.getInstance(algo); try(InputStream is = ...) { try(DigestInputStream dis = new DigestInputStream(is, md)) { while(dis.read() != -1); } return md.digest(); }
Keeping you updated, Christian
On Sat, Jan 24, 2015 at 7:39 PM, Johan Mörén johan.moren@gmail.com
wrote:
Hi Christian
I think you can go with Javas implementation all the way. like this
MessageDigest md = MessageDigest.getInstance("MD5"); InputStream is = new FileInputStream("C:\Temp\Small\Movie.mp4"); //
Size
700 MB
byte [] buffer = new byte [blockSize]; int numRead; do { numRead = is.read(buffer); if (numRead > 0) { md.update(buffer, 0, numRead); } } while (numRead != -1);
byte[] digest = md.digest();
On Sat Jan 24 2015 at 6:49:18 PM Christian Grün <
christian.gruen@gmail.com>
wrote:
Hi Johan,
looks like a useful feature! Currently, we use Java's default implementation for computing hashes [1]. If you want to help us, you could look out for an existing Java md5 hashing source code, which we could then adopt in BaseX!
Best, Christian
[1]
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Sat, Jan 24, 2015 at 11:37 AM, Johan Mörén johan.moren@gmail.com wrote:
Hello!
We have been using the hashing module to calculate md5 checksums on binary files successfully for a while. But last week we received our first really large file (4.3 gb) and our script threw a
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
We are currently using the 7.8 version of BaseX. I suspect that BaseX materialize the stream returned by file:read-binary as a byte-array
when
we call the hash:md5 function.
This is a snippet of our script where the problem arises ... let $binary := file:read-binary($filePath) let $checksum :=
lower-case(xs:string(xs:hexBinary(hash:md5($binary))))
...
I think a nice feature to add to BaseX could either be a new
function in
the file-module called file-checksum($algorithm) that calculates
checksum on
files in a streaming fashion. Or perhaps an option to the hashing functions that indicates that you want them to use streaming.
Regards, Johan Mörén
Hello!
Just wanted to report back that it works really well. It is about 50% slower than running the md5 command on the command line of my mac. A 4.15 gb file takes around 20 seconds in BaseX compared to 10 seconds using the native command.
Not sure if this is a limitation in Java or if performance could be tweaked further. But at the moment it feels unimportant for our case.
Thank you again for your swift reply and delivery!
Regards, Johan Mörén
On Sun Jan 25 2015 at 1:56:21 PM Johan Mörén johan.moren@gmail.com wrote:
Great news Christian. I'll try it out tomorrow at work!
/Johan
On Sun, Jan 25, 2015 at 1:22 PM, Christian Grün <christian.gruen@gmail.com
wrote:
Hi Johan,
A new snapshot is available [1]. In the course of rewriting the hashing code, I further improved our streamlining architecture [2, 3].
Your testing feedback is welcome, Christian
[1] http://files.basex.org/releases/latest/ [2] https://github.com/BaseXdb/basex/commit/b39b7 [3] https://github.com/BaseXdb/basex/commit/28139
On Sat, Jan 24, 2015 at 8:39 PM, Christian Grün christian.gruen@gmail.com wrote:
Thanks, this makes it much easier. I'll probably go for this one:
MessageDigest md = MessageDigest.getInstance(algo); try(InputStream is = ...) { try(DigestInputStream dis = new DigestInputStream(is, md)) { while(dis.read() != -1); } return md.digest(); }
Keeping you updated, Christian
On Sat, Jan 24, 2015 at 7:39 PM, Johan Mörén johan.moren@gmail.com
wrote:
Hi Christian
I think you can go with Javas implementation all the way. like this
MessageDigest md = MessageDigest.getInstance("MD5"); InputStream is = new FileInputStream("C:\Temp\Small\Movie.mp4"); //
Size
700 MB
byte [] buffer = new byte [blockSize]; int numRead; do { numRead = is.read(buffer); if (numRead > 0) { md.update(buffer, 0, numRead); } } while (numRead != -1);
byte[] digest = md.digest();
On Sat Jan 24 2015 at 6:49:18 PM Christian Grün <
christian.gruen@gmail.com>
wrote:
Hi Johan,
looks like a useful feature! Currently, we use Java's default implementation for computing hashes [1]. If you want to help us, you could look out for an existing Java md5 hashing source code, which we could then adopt in BaseX!
Best, Christian
[1]
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Sat, Jan 24, 2015 at 11:37 AM, Johan Mörén johan.moren@gmail.com wrote:
Hello!
We have been using the hashing module to calculate md5 checksums on binary files successfully for a while. But last week we received our first really large file (4.3 gb) and our script threw a
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
We are currently using the 7.8 version of BaseX. I suspect that
BaseX
materialize the stream returned by file:read-binary as a byte-array
when
we call the hash:md5 function.
This is a snippet of our script where the problem arises ... let $binary := file:read-binary($filePath) let $checksum :=
lower-case(xs:string(xs:hexBinary(hash:md5($binary))))
...
I think a nice feature to add to BaseX could either be a new
function in
the file-module called file-checksum($algorithm) that calculates
checksum on
files in a streaming fashion. Or perhaps an option to the hashing functions that indicates that you want them to use streaming.
Regards, Johan Mörén
Hi Johan,
Just wanted to report back that it works really well.
Glad to hear it works.
It is about 50% slower than running the md5 command on the command line of my mac.
My final solution is close to the one you proposed [1]: I decided to use a little buffer as well, because it was faster than calling md.update() for each single byte.
Using nio channels gives us better performance:
String path = ... RandomAccessFile raf = new RandomAccessFile(path, "r"); FileChannel ch = raf.getChannel(); ByteBuffer buf = ByteBuffer.allocate(IO.BLOCKSIZE); final MessageDigest md = MessageDigest.getInstance("md5"); do { final int n = ch.read(buf); if(n == -1) break; md.update(buf.array(), 0, n); buf.flip(); } while(true); System.out.println(Token.string(Token.hex(md.digest(), true)));
But I am not sure how smoothly this would integrate in our remaining streaming architecture, as we are also streaming main-memory objects. I'll keep it in mind, though.
Cheers, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
We run the script that uses this functionality embedded in a java application. I noticed now that the first time the code runs after a cold start. This log message appears.
java.nio.file.FileSystemNotFoundException: Provider "wrap" not installed at java.nio.file.Paths.get(Paths.java:147) at org.basex.util.Prop.homePath(Prop.java:142) at org.basex.util.Prop.<clinit>(Prop.java:96) at org.basex.core.StaticOptions.<clinit>(StaticOptions.java:20) at org.basex.core.Context.<init>(Context.java:77) at org.basex.core.Context.<init>(Context.java:69) at se.kb.mimer.util.xquery.XQueryClient.extractPackageFilesData(XQueryClient.java:16) ....
The script still produces the expected output. I guess that this is a handled exception inside BaseX that get printed out to the log with INFO level. Am i right?
Regards, Johan
On Mon Jan 26 2015 at 2:24:41 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Johan,
Just wanted to report back that it works really well.
Glad to hear it works.
It is about 50% slower than running the md5 command on the command line of my mac.
My final solution is close to the one you proposed [1]: I decided to use a little buffer as well, because it was faster than calling md.update() for each single byte.
Using nio channels gives us better performance:
String path = ... RandomAccessFile raf = new RandomAccessFile(path, "r"); FileChannel ch = raf.getChannel(); ByteBuffer buf = ByteBuffer.allocate(IO.BLOCKSIZE); final MessageDigest md = MessageDigest.getInstance("md5"); do { final int n = ch.read(buf); if(n == -1) break; md.update(buf.array(), 0, n); buf.flip(); } while(true); System.out.println(Token.string(Token.hex(md.digest(), true)));
But I am not sure how smoothly this would integrate in our remaining streaming architecture, as we are also streaming main-memory objects. I'll keep it in mind, though.
Cheers, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/ src/main/java/org/basex/query/func/hash/HashFn.java
Hi Johan,
I haven't come across this exception before. Maybe you can find out which value is bound to the LOCATION variable in your environment [1]?
Thanks in advance, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Mon, Jan 26, 2015 at 4:17 PM, Johan Mörén johan.moren@gmail.com wrote:
We run the script that uses this functionality embedded in a java application. I noticed now that the first time the code runs after a cold start. This log message appears.
java.nio.file.FileSystemNotFoundException: Provider "wrap" not installed at java.nio.file.Paths.get(Paths.java:147) at org.basex.util.Prop.homePath(Prop.java:142) at org.basex.util.Prop.<clinit>(Prop.java:96) at org.basex.core.StaticOptions.<clinit>(StaticOptions.java:20) at org.basex.core.Context.<init>(Context.java:77) at org.basex.core.Context.<init>(Context.java:69) at se.kb.mimer.util.xquery.XQueryClient.extractPackageFilesData(XQueryClient.java:16) ....
The script still produces the expected output. I guess that this is a handled exception inside BaseX that get printed out to the log with INFO level. Am i right?
Regards, Johan
On Mon Jan 26 2015 at 2:24:41 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Johan,
Just wanted to report back that it works really well.
Glad to hear it works.
It is about 50% slower than running the md5 command on the command line of my mac.
My final solution is close to the one you proposed [1]: I decided to use a little buffer as well, because it was faster than calling md.update() for each single byte.
Using nio channels gives us better performance:
String path = ... RandomAccessFile raf = new RandomAccessFile(path, "r"); FileChannel ch = raf.getChannel(); ByteBuffer buf = ByteBuffer.allocate(IO.BLOCKSIZE); final MessageDigest md = MessageDigest.getInstance("md5"); do { final int n = ch.read(buf); if(n == -1) break; md.update(buf.array(), 0, n); buf.flip(); } while(true); System.out.println(Token.string(Token.hex(md.digest(), true)));
But I am not sure how smoothly this would integrate in our remaining streaming architecture, as we are also streaming main-memory objects. I'll keep it in mind, though.
Cheers, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
Hi i could for some reason not debug the Prop class. But i see that MainOptions.getClass().getProtectedDomain().getCodeSource() returns the String
wrap:mvn:org.basex/basex/8.0-SNAPSHOT$Export-Package=org.basex*;version=8.0&Bundle-SymbolicName=BaseX-8.0-SNAPSHOT
This is the command we use to install the BaseX library into our application server. The osgi-container Apache Karaf. I guess that this URL is later used but then the scheme "wrap" is not recognised.
Regards, Johan Mörén
On Mon Jan 26 2015 at 7:53:42 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Johan,
I haven't come across this exception before. Maybe you can find out which value is bound to the LOCATION variable in your environment [1]?
Thanks in advance, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/ src/main/java/org/basex/util/Prop.java#L25-L39
On Mon, Jan 26, 2015 at 4:17 PM, Johan Mörén johan.moren@gmail.com wrote:
We run the script that uses this functionality embedded in a java application. I noticed now that the first time the code runs after a cold start. This log message appears.
java.nio.file.FileSystemNotFoundException: Provider "wrap" not installed at java.nio.file.Paths.get(Paths.java:147) at org.basex.util.Prop.homePath(Prop.java:142) at org.basex.util.Prop.<clinit>(Prop.java:96) at org.basex.core.StaticOptions.<clinit>(StaticOptions.java:20) at org.basex.core.Context.<init>(Context.java:77) at org.basex.core.Context.<init>(Context.java:69) at se.kb.mimer.util.xquery.XQueryClient.extractPackageFilesData(
XQueryClient.java:16)
....
The script still produces the expected output. I guess that this is a handled exception inside BaseX that get printed out to the log with INFO level. Am i right?
Regards, Johan
On Mon Jan 26 2015 at 2:24:41 PM Christian Grün <
christian.gruen@gmail.com>
wrote:
Hi Johan,
Just wanted to report back that it works really well.
Glad to hear it works.
It is about 50% slower than running the md5 command on the command line of my mac.
My final solution is close to the one you proposed [1]: I decided to use a little buffer as well, because it was faster than calling md.update() for each single byte.
Using nio channels gives us better performance:
String path = ... RandomAccessFile raf = new RandomAccessFile(path, "r"); FileChannel ch = raf.getChannel(); ByteBuffer buf = ByteBuffer.allocate(IO.BLOCKSIZE); final MessageDigest md = MessageDigest.getInstance("md5"); do { final int n = ch.read(buf); if(n == -1) break; md.update(buf.array(), 0, n); buf.flip(); } while(true); System.out.println(Token.string(Token.hex(md.digest(), true)));
But I am not sure how smoothly this would integrate in our remaining streaming architecture, as we are also streaming main-memory objects. I'll keep it in mind, though.
Cheers, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/
src/main/java/org/basex/query/func/hash/HashFn.java
Thanks. I forgot to my last mail that the output stack trace is indeed triggered by our own code (as you aready may have seen in the referenced code anyway).
On Tue, Jan 27, 2015 at 2:56 PM, Johan Mörén johan.moren@gmail.com wrote:
Hi i could for some reason not debug the Prop class. But i see that MainOptions.getClass().getProtectedDomain().getCodeSource() returns the String
wrap:mvn:org.basex/basex/8.0-SNAPSHOT$Export-Package=org.basex*;version=8.0&Bundle-SymbolicName=BaseX-8.0-SNAPSHOT
This is the command we use to install the BaseX library into our application server. The osgi-container Apache Karaf. I guess that this URL is later used but then the scheme "wrap" is not recognised.
Regards, Johan Mörén
On Mon Jan 26 2015 at 7:53:42 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Johan,
I haven't come across this exception before. Maybe you can find out which value is bound to the LOCATION variable in your environment [1]?
Thanks in advance, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Mon, Jan 26, 2015 at 4:17 PM, Johan Mörén johan.moren@gmail.com wrote:
We run the script that uses this functionality embedded in a java application. I noticed now that the first time the code runs after a cold start. This log message appears.
java.nio.file.FileSystemNotFoundException: Provider "wrap" not installed at java.nio.file.Paths.get(Paths.java:147) at org.basex.util.Prop.homePath(Prop.java:142) at org.basex.util.Prop.<clinit>(Prop.java:96) at org.basex.core.StaticOptions.<clinit>(StaticOptions.java:20) at org.basex.core.Context.<init>(Context.java:77) at org.basex.core.Context.<init>(Context.java:69) at
se.kb.mimer.util.xquery.XQueryClient.extractPackageFilesData(XQueryClient.java:16) ....
The script still produces the expected output. I guess that this is a handled exception inside BaseX that get printed out to the log with INFO level. Am i right?
Regards, Johan
On Mon Jan 26 2015 at 2:24:41 PM Christian Grün christian.gruen@gmail.com wrote:
Hi Johan,
Just wanted to report back that it works really well.
Glad to hear it works.
It is about 50% slower than running the md5 command on the command line of my mac.
My final solution is close to the one you proposed [1]: I decided to use a little buffer as well, because it was faster than calling md.update() for each single byte.
Using nio channels gives us better performance:
String path = ... RandomAccessFile raf = new RandomAccessFile(path, "r"); FileChannel ch = raf.getChannel(); ByteBuffer buf = ByteBuffer.allocate(IO.BLOCKSIZE); final MessageDigest md = MessageDigest.getInstance("md5"); do { final int n = ch.read(buf); if(n == -1) break; md.update(buf.array(), 0, n); buf.flip(); } while(true); System.out.println(Token.string(Token.hex(md.digest(), true)));
But I am not sure how smoothly this would integrate in our remaining streaming architecture, as we are also streaming main-memory objects. I'll keep it in mind, though.
Cheers, Christian
[1]
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
basex-talk@mailman.uni-konstanz.de