Hi
I have built a small client in XQuery that fetches zip files from another remote service using the http-module and stores them on disk with the help of the file-module.
Some of the returned zip files are really large and BaseX seems to need to materialize the files in memory before writing them to disk resulting in out of memory errors for some files.
Is there some way read the response in a more streamable fashion when using the http-module?
I tried using the fetch:binary function successfully but i really need the more extended functionality of the http-module.
#results in out of memory if the zip file is too large file:write-binary("file.zip", http:sendrequest(...)[2])
#works all the time. file:write-binary("file.zip", fetch:binary(...))
Regards, Johan
Hi Johan,
the HTTP Module is pretty magic, because it automatically tries to convert the input to the expected result format. This makes it difficult to stream.
We have already pondered two options to circumvent this restriction:
* Fetch Module: extend the function signatures with additional options * HTTP Module (our favorite): add additional functions (e.g. http:get(), http:post(), etc.) with xs:base64 as return type.
Which options of the HTTP Module do you currently use?
Christian ______________________________
On Thu, Mar 27, 2014 at 1:06 PM, Johan Mörén hutchkintoot@gmail.com wrote:
Hi
I have built a small client in XQuery that fetches zip files from another remote service using the http-module and stores them on disk with the help of the file-module.
Some of the returned zip files are really large and BaseX seems to need to materialize the files in memory before writing them to disk resulting in out of memory errors for some files.
Is there some way read the response in a more streamable fashion when using the http-module?
I tried using the fetch:binary function successfully but i really need the more extended functionality of the http-module.
#results in out of memory if the zip file is too large file:write-binary("file.zip", http:sendrequest(...)[2])
#works all the time. file:write-binary("file.zip", fetch:binary(...))
Regards, Johan
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christian
The options i need from the http-module are at the moment mainly the ability to authenticate. Getting hold of response/request headers and status code are also very useful if you want to do some more detailed error handling.
Extending the HTTP-module sounds like my favourite as well.
/Johan
On Thu, Mar 27, 2014 at 1:57 PM, Christian Grün christian.gruen@gmail.comwrote:
Hi Johan,
the HTTP Module is pretty magic, because it automatically tries to convert the input to the expected result format. This makes it difficult to stream.
We have already pondered two options to circumvent this restriction:
- Fetch Module: extend the function signatures with additional options
- HTTP Module (our favorite): add additional functions (e.g.
http:get(), http:post(), etc.) with xs:base64 as return type.
Which options of the HTTP Module do you currently use?
Christian ______________________________
On Thu, Mar 27, 2014 at 1:06 PM, Johan Mörén hutchkintoot@gmail.com wrote:
Hi
I have built a small client in XQuery that fetches zip files from another remote service using the http-module and stores them on disk with the
help
of the file-module.
Some of the returned zip files are really large and BaseX seems to need
to
materialize the files in memory before writing them to disk resulting in
out
of memory errors for some files.
Is there some way read the response in a more streamable fashion when
using
the http-module?
I tried using the fetch:binary function successfully but i really need
the
more extended functionality of the http-module.
#results in out of memory if the zip file is too large file:write-binary("file.zip", http:sendrequest(...)[2])
#works all the time. file:write-binary("file.zip", fetch:binary(...))
Regards, Johan
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Johan,
that's something you may know anyway, but... you can as well specify the authentication data in your URL:
Hope this helps, Christian
On Thu, Mar 27, 2014 at 3:37 PM, Johan Mörén hutchkintoot@gmail.com wrote:
Hi Christian
The options i need from the http-module are at the moment mainly the ability to authenticate. Getting hold of response/request headers and status code are also very useful if you want to do some more detailed error handling.
Extending the HTTP-module sounds like my favourite as well.
/Johan
On Thu, Mar 27, 2014 at 1:57 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Johan,
the HTTP Module is pretty magic, because it automatically tries to convert the input to the expected result format. This makes it difficult to stream.
We have already pondered two options to circumvent this restriction:
- Fetch Module: extend the function signatures with additional options
- HTTP Module (our favorite): add additional functions (e.g.
http:get(), http:post(), etc.) with xs:base64 as return type.
Which options of the HTTP Module do you currently use?
Christian ______________________________
On Thu, Mar 27, 2014 at 1:06 PM, Johan Mörén hutchkintoot@gmail.com wrote:
Hi
I have built a small client in XQuery that fetches zip files from another remote service using the http-module and stores them on disk with the help of the file-module.
Some of the returned zip files are really large and BaseX seems to need to materialize the files in memory before writing them to disk resulting in out of memory errors for some files.
Is there some way read the response in a more streamable fashion when using the http-module?
I tried using the fetch:binary function successfully but i really need the more extended functionality of the http-module.
#results in out of memory if the zip file is too large file:write-binary("file.zip", http:sendrequest(...)[2])
#works all the time. file:write-binary("file.zip", fetch:binary(...))
Regards, Johan
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christian
Was aware of that possibility but it only solves one of the problems. Some way of streaming a large response as text or binary with the http-module would be desirable. Having control of the http-headers of the request and response is getting more an more important when interacting with REST-services.
/Johan
On Thu, Mar 27, 2014 at 4:39 PM, Christian Grün christian.gruen@gmail.comwrote:
Hi Johan,
that's something you may know anyway, but... you can as well specify the authentication data in your URL:
Hope this helps, Christian
On Thu, Mar 27, 2014 at 3:37 PM, Johan Mörén hutchkintoot@gmail.com wrote:
Hi Christian
The options i need from the http-module are at the moment mainly the
ability
to authenticate. Getting hold of response/request headers and status code are also very useful if you want to do some more detailed error handling.
Extending the HTTP-module sounds like my favourite as well.
/Johan
On Thu, Mar 27, 2014 at 1:57 PM, Christian Grün <
christian.gruen@gmail.com>
wrote:
Hi Johan,
the HTTP Module is pretty magic, because it automatically tries to convert the input to the expected result format. This makes it difficult to stream.
We have already pondered two options to circumvent this restriction:
- Fetch Module: extend the function signatures with additional options
- HTTP Module (our favorite): add additional functions (e.g.
http:get(), http:post(), etc.) with xs:base64 as return type.
Which options of the HTTP Module do you currently use?
Christian ______________________________
On Thu, Mar 27, 2014 at 1:06 PM, Johan Mörén hutchkintoot@gmail.com wrote:
Hi
I have built a small client in XQuery that fetches zip files from another remote service using the http-module and stores them on disk with the help of the file-module.
Some of the returned zip files are really large and BaseX seems to
need
to materialize the files in memory before writing them to disk resulting
in
out of memory errors for some files.
Is there some way read the response in a more streamable fashion when using the http-module?
I tried using the fetch:binary function successfully but i really need the more extended functionality of the http-module.
#results in out of memory if the zip file is too large file:write-binary("file.zip", http:sendrequest(...)[2])
#works all the time. file:write-binary("file.zip", fetch:binary(...))
Regards, Johan
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
On 27 March 2014 13:57, Christian Grün wrote:
Hi,
- HTTP Module (our favorite): add additional functions (e.g.
http:get(), http:post(), etc.) with xs:base64 as return type.
I am wondering... The current API already supports xs:base64Binary, in the case of a binary type (technically, for all non-text, non-HTML, non-XML types). Which will be the case for a ZIP file (as long as the server returns a correct Content-Type, but you can always overrides it anyway).
Is there any intrinsic difference, from an implementation point of view, in the context of streamability of the returned content, between a function returning always an xs:base64Binary item, and another one returning sometimes an xs:base64Binary item, sometimes a string, and sometimes a document node?
From what I understand of the implementation strategies, a kind of naive implementation would be to create an alternative implementation of xs:base64Binary, which does not contain the actual bits (e.g. as an array in memory), but rather a handle to a binary stream (e.g. in Java it would contain an InputStream, consumed only when needed).
And whether a function will always return such a "binary item delegating to a stream under the scene", or instead will sometimes return such an item and sometimes strings or nodes, it does not seem relevant in this approach, is it?
Of course, BaseX might want to provide the user the ability to enable such streaming or not, as it can be a restriction on how to use the binary items (you can read them only once, or pass them only to streaming-enabled functions).
I am just trying to see if there is anything that intrinsically prevents streaming in the HTTP Client, or if it is just that some possible implementation strategies have not been investigated.
I am pretty sure Adam implemented streaming for the HTTP Client in eXist, so it might be interesting to have a look there as well. That is indeed a very interesting feedback, thank you!
Regards,
Hi Florent,
Is there any intrinsic difference, from an implementation point of view, in the context of streamability of the returned content, between a function returning always an xs:base64Binary item, and another one returning sometimes an xs:base64Binary item, sometimes a string, and sometimes a document node?
Yes there is, at least in our architecture: a specific function will always be streamable or not [1]. This static distinction is necessary because various optimizations rely on it.
I remember you talked about a second version of the HTTP Client Module... Have you thought about removing the "magic parts" of the spec and provide explicit functions for retrieving binary or string data, similar to the EXPath Archive and File Modules?
I am pretty sure Adam implemented streaming for the HTTP Client in eXist, so it might be interesting to have a look there as well.
Sounds interesting, but I couldn't find any hints in their documentation. Do you have any reference that confirms this assumption? Or some general info on how streamable results are handled in eXist? What about the first http:response item, do you know if it will also be streamed?
Best, Christian
Hi Johan, hi Florent,
Maybe 8 months ago, I have opened a new issue for extending HTTP client functionality in BaseX. The main requirement was that HTTP responses should be streamable. As HTTP responses are currently returned as elements, it is not possible (at least in BaseX) to include streamable sub elements. We could either…
* provide HTTP client functions that only return the body itself, or * wrap all results in maps and arrays.
In both cases, we could avoid wrapping binary items as Base64, which would save us both memory and processing time.
My questions to you guys are:
@Florent: Have you thought about using maps and arrays in a new version of the HTTP Client spec?
@Johan (in case you remember): What HTTP client features do you need that prevent you from using the fetch:binary function?
Thanks in advance, Christian
[1] https://github.com/BaseXdb/basex/issues/914
On Fri, Mar 28, 2014 at 5:31 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Florent,
Is there any intrinsic difference, from an implementation point of view, in the context of streamability of the returned content, between a function returning always an xs:base64Binary item, and another one returning sometimes an xs:base64Binary item, sometimes a string, and sometimes a document node?
Yes there is, at least in our architecture: a specific function will always be streamable or not [1]. This static distinction is necessary because various optimizations rely on it.
I remember you talked about a second version of the HTTP Client Module... Have you thought about removing the "magic parts" of the spec and provide explicit functions for retrieving binary or string data, similar to the EXPath Archive and File Modules?
I am pretty sure Adam implemented streaming for the HTTP Client in eXist, so it might be interesting to have a look there as well.
Sounds interesting, but I couldn't find any hints in their documentation. Do you have any reference that confirms this assumption? Or some general info on how streamable results are handled in eXist? What about the first http:response item, do you know if it will also be streamed?
Best, Christian
basex-talk@mailman.uni-konstanz.de