The EXPath HTTP Client does seem to provide low level HTTP access. I am hoping to find an XQuery library that implements some common things such as cookies and authentication on top of HTTP Client, but haven’t come across such a library yet. There are a few OATH implementations for authentication though.
I’ll have a look at XML Calabash’s HTTP cookie handling.
Fortunately, in the project that I currently have authentication is not needed. Here is the code that I currently have working. A query can fetch URL(s) by calling local:httpGet(), which does a request to get the cookies that the web site requires, and then does request(s) to return the web page for each URL provided.
declare function local:httpResponseCookies($response as element(http:response)) as element(http:header) { let $setCookies := $response/http:header[@name = 'Set-Cookie']/@value/data() let $cookies := string-join(for $cookie in $setCookies return substring-before($cookie, '; '), '; ') return <http:header name="Cookie" value="{$cookies}"/> };
declare function local:httpGet($urls as xs:string+) as element(page)* { let $response := http:send-request(<http:request method='get'/>, $urls[1]) for $url in $urls let $response := http:send-request(<http:request method='get'> {local:httpResponseCookies($response[self::http:response])} </http:request>, $url) return element page { attribute url { $url }, $response[2] } };
Thanks, Vincent
From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Andy Bunce Sent: Tuesday, July 14, 2015 12:11 PM To: Florent Georges Cc: BaseX Subject: Re: [basex-talk] HTTP module and cookies
In my experience the case that causes the most problem is the authentication redirect. I have never tried this with BaseX but I have been very grateful in the past that XMLCalabash implements this:
"The exception arises in the case of redirection. If a redirect response includes cookies, those cookies are forwarded as appropriate to the redirected location when the redirection is followed." [1] /Andy
[1] http://xprocbook.com/book/refentry-19.html#cookies
On 10 July 2015 at 10:36, Florent Georges <fgeorges@fgeorges.orgmailto:fgeorges@fgeorges.org> wrote: Hi,
Correct me if I am wrong, but I believe the HTTP Client in BaseX is the EXPath HTTP Client? It was indeed designed to provide access to low-level, raw HTTP. It does not contain a lot of higher level feature based on HTTP itself. Indeed, you have to handle cookies yourself for instance.
The difficulty here, if I am right, is the side-effects required to pass information somehow (in a hidden way) between 2 different HTTP requests.
Any suggestion to improve the API is welcome (at least on the EXPath mailing list, I don't want to speak for BaseX developers, but I am pretty sure here as well :-)...)
Regards,
-- Florent Georges http://fgeorges.org/ http://h2oconsulting.be/
On 10 July 2015 at 11:13, Christian Grün wrote:
Hi Vincent,
So far, I'm not aware of a standard solution to handle and cache client-side cookies with BaseX. Could you show us your solution? It might help us to discuss alternative solutions.
Best, Christian
On Thu, Jul 9, 2015 at 8:30 PM, Lizzi, Vincent <Vincent.Lizzi@taylorandfrancis.commailto:Vincent.Lizzi@taylorandfrancis.com> wrote:
I am using BaseX to scrape data from a web site. This web site, probably like many other websites, relies on cookies and if it does not receive the expected cookies it delivers a page instructing you to enable cookies in your browser. I was able to get this working by parsing the http:header response to get the cookies to use in subsequent requests. This is the second time I’ve done this, and even though this works it seems a bit hacky. Is there a standard way of handling cookies using the HTTP Module or the Fetch module? Or, are there any well written code examples available?
In other environments typically you define a cookie jar in some way, and the cookie jar is used (and is updated) automatically in all subsequent HTTP requests. I’m hoping to find something similar in BaseX.
Thanks, Vincent