stream binary responses from http:module

List overview All Threads
Download

newer

older

Origami - simple templating module

Cross-platform deployment of BaseX

Johan Mörén

27 Mar 2014 27 Mar '14

8:06 a.m.

I have built a small client in XQuery that fetches zip files from another remote service using the http-module and stores them on disk with the help of the file-module.

Some of the returned zip files are really large and BaseX seems to need to materialize the files in memory before writing them to disk resulting in out of memory errors for some files.

Is there some way read the response in a more streamable fashion when using the http-module?

I tried using the fetch:binary function successfully but i really need the more extended functionality of the http-module.

#results in out of memory if the zip file is too large file:write-binary("file.zip", http:sendrequest(...)[2])

#works all the time. file:write-binary("file.zip", fetch:binary(...))

Regards, Johan

Attachments:

attachment.html (text/html — 1.1 KB)

Show replies by date

Christian Grün

27 Mar 27 Mar

8:57 a.m.

Hi Johan,

the HTTP Module is pretty magic, because it automatically tries to convert the input to the expected result format. This makes it difficult to stream.

We have already pondered two options to circumvent this restriction:

* Fetch Module: extend the function signatures with additional options * HTTP Module (our favorite): add additional functions (e.g. http:get(), http:post(), etc.) with xs:base64 as return type.

Which options of the HTTP Module do you currently use?

Christian ______________________________

On Thu, Mar 27, 2014 at 1:06 PM, Johan Mörén hutchkintoot@gmail.com wrote:

...

Hi

I have built a small client in XQuery that fetches zip files from another remote service using the http-module and stores them on disk with the help of the file-module.

Some of the returned zip files are really large and BaseX seems to need to materialize the files in memory before writing them to disk resulting in out of memory errors for some files.

Is there some way read the response in a more streamable fashion when using the http-module?

I tried using the fetch:binary function successfully but i really need the more extended functionality of the http-module.

#results in out of memory if the zip file is too large file:write-binary("file.zip", http:sendrequest(...)[2])

#works all the time. file:write-binary("file.zip", fetch:binary(...))

Regards, Johan

BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Johan Mörén

10:37 a.m.

Hi Christian

The options i need from the http-module are at the moment mainly the ability to authenticate. Getting hold of response/request headers and status code are also very useful if you want to do some more detailed error handling.

Extending the HTTP-module sounds like my favourite as well.

/Johan

On Thu, Mar 27, 2014 at 1:57 PM, Christian Grün christian.gruen@gmail.comwrote:

...

Hi Johan,

the HTTP Module is pretty magic, because it automatically tries to convert the input to the expected result format. This makes it difficult to stream.

We have already pondered two options to circumvent this restriction:

Fetch Module: extend the function signatures with additional options

HTTP Module (our favorite): add additional functions (e.g.

http:get(), http:post(), etc.) with xs:base64 as return type.

Which options of the HTTP Module do you currently use?

Christian ______________________________

On Thu, Mar 27, 2014 at 1:06 PM, Johan Mörén hutchkintoot@gmail.com wrote:

...
Hi

I have built a small client in XQuery that fetches zip files from another remote service using the http-module and stores them on disk with the

help

...
of the file-module.

Some of the returned zip files are really large and BaseX seems to need

to

...
materialize the files in memory before writing them to disk resulting in

out

...
of memory errors for some files.

Is there some way read the response in a more streamable fashion when

using

...
the http-module?

I tried using the fetch:binary function successfully but i really need

the

...
more extended functionality of the http-module.

#results in out of memory if the zip file is too large file:write-binary("file.zip", http:sendrequest(...)[2])

#works all the time. file:write-binary("file.zip", fetch:binary(...))

Regards, Johan

BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Christian Grün

11:39 a.m.

Hi Johan,

that's something you may know anyway, but... you can as well specify the authentication data in your URL:

http://name:password@...

Hope this helps, Christian

On Thu, Mar 27, 2014 at 3:37 PM, Johan Mörén hutchkintoot@gmail.com wrote:

...

Hi Christian

The options i need from the http-module are at the moment mainly the ability to authenticate. Getting hold of response/request headers and status code are also very useful if you want to do some more detailed error handling.

Extending the HTTP-module sounds like my favourite as well.

/Johan

On Thu, Mar 27, 2014 at 1:57 PM, Christian Grün christian.gruen@gmail.com wrote:

...
Hi Johan,

the HTTP Module is pretty magic, because it automatically tries to convert the input to the expected result format. This makes it difficult to stream.

We have already pondered two options to circumvent this restriction:

Fetch Module: extend the function signatures with additional options

HTTP Module (our favorite): add additional functions (e.g.

http:get(), http:post(), etc.) with xs:base64 as return type.

Which options of the HTTP Module do you currently use?

Christian ______________________________

On Thu, Mar 27, 2014 at 1:06 PM, Johan Mörén hutchkintoot@gmail.com wrote:

...
Hi

I have built a small client in XQuery that fetches zip files from another remote service using the http-module and stores them on disk with the help of the file-module.

Some of the returned zip files are really large and BaseX seems to need to materialize the files in memory before writing them to disk resulting in out of memory errors for some files.

Is there some way read the response in a more streamable fashion when using the http-module?

I tried using the fetch:binary function successfully but i really need the more extended functionality of the http-module.

#results in out of memory if the zip file is too large file:write-binary("file.zip", http:sendrequest(...)[2])

#works all the time. file:write-binary("file.zip", fetch:binary(...))

Regards, Johan

BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Johan Mörén

6:01 p.m.

Hi Christian

Was aware of that possibility but it only solves one of the problems. Some way of streaming a large response as text or binary with the http-module would be desirable. Having control of the http-headers of the request and response is getting more an more important when interacting with REST-services.

/Johan

On Thu, Mar 27, 2014 at 4:39 PM, Christian Grün christian.gruen@gmail.comwrote:

...

Hi Johan,

that's something you may know anyway, but... you can as well specify the authentication data in your URL:

http://name:password@...

Hope this helps, Christian

On Thu, Mar 27, 2014 at 3:37 PM, Johan Mörén hutchkintoot@gmail.com wrote:

...
Hi Christian

The options i need from the http-module are at the moment mainly the

ability

...
to authenticate. Getting hold of response/request headers and status code are also very useful if you want to do some more detailed error handling.

Extending the HTTP-module sounds like my favourite as well.

/Johan

On Thu, Mar 27, 2014 at 1:57 PM, Christian Grün <

christian.gruen@gmail.com>

...
wrote:

...
Hi Johan,

the HTTP Module is pretty magic, because it automatically tries to convert the input to the expected result format. This makes it difficult to stream.

We have already pondered two options to circumvent this restriction:

Fetch Module: extend the function signatures with additional options

HTTP Module (our favorite): add additional functions (e.g.

http:get(), http:post(), etc.) with xs:base64 as return type.

Which options of the HTTP Module do you currently use?

Christian ______________________________

On Thu, Mar 27, 2014 at 1:06 PM, Johan Mörén hutchkintoot@gmail.com wrote:

...
Hi

I have built a small client in XQuery that fetches zip files from another remote service using the http-module and stores them on disk with the help of the file-module.

Some of the returned zip files are really large and BaseX seems to

need

...
...
...
to materialize the files in memory before writing them to disk resulting

in

...
...
...
out of memory errors for some files.

Is there some way read the response in a more streamable fashion when using the http-module?

I tried using the fetch:binary function successfully but i really need the more extended functionality of the http-module.

#results in out of memory if the zip file is too large file:write-binary("file.zip", http:sendrequest(...)[2])

#works all the time. file:write-binary("file.zip", fetch:binary(...))

Regards, Johan

BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Florent Georges

28 Mar 28 Mar

10:54 a.m.

On 27 March 2014 13:57, Christian Grün wrote:

Hi,

...

HTTP Module (our favorite): add additional functions (e.g.

http:get(), http:post(), etc.) with xs:base64 as return type.

I am wondering... The current API already supports xs:base64Binary, in the case of a binary type (technically, for all non-text, non-HTML, non-XML types). Which will be the case for a ZIP file (as long as the server returns a correct Content-Type, but you can always overrides it anyway).

Is there any intrinsic difference, from an implementation point of view, in the context of streamability of the returned content, between a function returning always an xs:base64Binary item, and another one returning sometimes an xs:base64Binary item, sometimes a string, and sometimes a document node?

From what I understand of the implementation strategies, a kind of naive implementation would be to create an alternative implementation of xs:base64Binary, which does not contain the actual bits (e.g. as an array in memory), but rather a handle to a binary stream (e.g. in Java it would contain an InputStream, consumed only when needed).

And whether a function will always return such a "binary item delegating to a stream under the scene", or instead will sometimes return such an item and sometimes strings or nodes, it does not seem relevant in this approach, is it?

Of course, BaseX might want to provide the user the ability to enable such streaming or not, as it can be a restriction on how to use the binary items (you can read them only once, or pass them only to streaming-enabled functions).

I am just trying to see if there is anything that intrinsically prevents streaming in the HTTP Client, or if it is just that some possible implementation strategies have not been investigated.

I am pretty sure Adam implemented streaming for the HTTP Client in eXist, so it might be interesting to have a look there as well. That is indeed a very interesting feedback, thank you!

Regards,

-- Florent Georges http://fgeorges.org/ http://h2oconsulting.be/

Christian Grün

12:31 p.m.

Hi Florent,

...

Is there any intrinsic difference, from an implementation point of view, in the context of streamability of the returned content, between a function returning always an xs:base64Binary item, and another one returning sometimes an xs:base64Binary item, sometimes a string, and sometimes a document node?

Yes there is, at least in our architecture: a specific function will always be streamable or not [1]. This static distinction is necessary because various optimizations rely on it.

I remember you talked about a second version of the HTTP Client Module... Have you thought about removing the "magic parts" of the spec and provide explicit functions for retrieving binary or string data, similar to the EXPath Archive and File Modules?

...

I am pretty sure Adam implemented streaming for the HTTP Client in eXist, so it might be interesting to have a look there as well.

Sounds interesting, but I couldn't find any hints in their documentation. Do you have any reference that confirms this assumption? Or some general info on how streamable results are handled in eXist? What about the first http:response item, do you know if it will also be streamed?

Best, Christian

[1] http://docs.basex.org/wiki/Streaming_Module

Christian Grün

10 Nov 10 Nov

9 a.m.

Hi Johan, hi Florent,

Maybe 8 months ago, I have opened a new issue for extending HTTP client functionality in BaseX. The main requirement was that HTTP responses should be streamable. As HTTP responses are currently returned as elements, it is not possible (at least in BaseX) to include streamable sub elements. We could either…

* provide HTTP client functions that only return the body itself, or * wrap all results in maps and arrays.

In both cases, we could avoid wrapping binary items as Base64, which would save us both memory and processing time.

My questions to you guys are:

@Florent: Have you thought about using maps and arrays in a new version of the HTTP Client spec?

@Johan (in case you remember): What HTTP client features do you need that prevent you from using the fetch:binary function?

Thanks in advance, Christian

[1] https://github.com/BaseXdb/basex/issues/914

On Fri, Mar 28, 2014 at 5:31 PM, Christian Grün christian.gruen@gmail.com wrote:

...

Hi Florent,

...
Is there any intrinsic difference, from an implementation point of view, in the context of streamability of the returned content, between a function returning always an xs:base64Binary item, and another one returning sometimes an xs:base64Binary item, sometimes a string, and sometimes a document node?

Yes there is, at least in our architecture: a specific function will always be streamable or not [1]. This static distinction is necessary because various optimizations rely on it.

I remember you talked about a second version of the HTTP Client Module... Have you thought about removing the "magic parts" of the spec and provide explicit functions for retrieving binary or string data, similar to the EXPath Archive and File Modules?

...
I am pretty sure Adam implemented streaming for the HTTP Client in eXist, so it might be interesting to have a look there as well.

Sounds interesting, but I couldn't find any hints in their documentation. Do you have any reference that confirms this assumption? Or some general info on how streamable results are handled in eXist? What about the first http:response item, do you know if it will also be streamed?

Best, Christian

[1] http://docs.basex.org/wiki/Streaming_Module

3903

Age (days ago)

4131

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

7 comments

3 participants

tags (0)

participants (3)

Christian Grün
Florent Georges
Johan Mörén