Hi Andy,
I think the @src is not working because no base64 encoding is done. I think some review of the code [1] [2] might help.
I must confess I haven’t followed the full conversation, but based on your hints I have revised the BaseX code revolving around the src attribute [1]; hopefully, the latest snapshot [2] does what it’s supposed to do.
@Florent: I’m not quite sure how to properly handle the input linked by @src, which is why I have sent you another e-mail via the expath mailing list. This is what I currently do [3].
Looking forward to your feedback, Christian
[1] https://github.com/BaseXdb/basex/commit/ef4f7c [2] http://files.basex.org/releases/latest [3] https://github.com/BaseXdb/basex/blob/next/basex-core/src/main/java/org/base... _______________________________________
On Thu, Jan 9, 2014 at 1:50 PM, Andy Bunce bunce.andy@gmail.com wrote:
Using http:send-request#3, i.e using an explicit body works and seems to make sense because the type information is available in this form.
let $request := <http:request method='PUT' > <http:body media-type="application/octet-stream" method="raw"/> </http:request> let $r:= http:send-request($request,$tika,fetch:binary($file))
Regards /Andy
[1] https://github.com/BaseXdb/basex/blob/5b4bcc3272b0611bc99b7d9eebbb432b22a4dc...
[2] https://github.com/BaseXdb/basex/blob/next/basex-core/src/main/java/org/base...
On Mon, Jan 6, 2014 at 12:09 PM, Andy Bunce bunce.andy@gmail.com wrote:
Florent:
Thanks for the tcpdump and @src tips
Lukas That matches my experience. I wonder if this is relevant:
http://stackoverflow.com/questions/18728100/file-upload-via-http-put-request
/Andy
On Mon, Jan 6, 2014 at 7:49 AM, Lukas Kircher lukas.kircher@uni-konstanz.de wrote:
Hi all,
again:
curl -v -X PUT -T some.pdf http://localhost:9998/tika --header "Content-Type: application/pdf"
... and tika returns plain text as it should - so a working MIME type would be 'application/pdf'.
Now off to BaseX:
let $request := <http:request method='PUT' > <http:body media-type="application/pdf" src="some.pdf"/> </http:request> return http:send-request($request,"http://localhost:9998/tika")
For this, tika returns 415 - unsupported media type. Although specifying the MIME type this time, the content that BaseX sends does not look like what tika expects.
let $file:="some.pdf", $request := <http:request method='PUT'> <http:body media-type="application/pdf">{ fetch:binary($file) }</http:body> </http:request> return http:send-request($request,"http://localhost:9998/tika")
For this, tika returns 500 - processing error. Media type is specified to 'application/pdf' which works with curl (see above) but not with BaseX. Also the tcpdump differs for the BaseX requests, as expected. So either we're doing something really wrong, or BaseX sends the content in a way it's not supposed to. In the latter case I'm not the one to look into this issue and we have to wait for someone to take a proper look at it.
Regards, Lukas
On Sun, Jan 5, 2014 at 5:06 PM, Andy Bunce bunce.andy@gmail.com wrote:
Hi Dirk, The Tika documentation is not very clear[1]. tika-app has a simple server mode. tika-server, which I am using, is a different jar [2]
[1] http://stackoverflow.com/questions/12231630/how-to-use-tika-in-server-mode [2] http://mvnrepository.com/artifact/org.apache.tika/tika-server/1.4
On Sun, Jan 5, 2014 at 3:39 PM, Dirk Kirsten dk@basex.org wrote:
Hello,
You can also simple get all the request headers using the -v flag when running curl. Or you could use wireshark, which (at least to me) seems easier than using tcpdump.
I'd like to reproduce your problem, but I seem to be too stupid to get the Tika server up and running. When running java -jar tika-app-1.4.jar -s 9999
(or even with the verbose flag) I simply don't get any thing (but a running process) and the server seems to me not properly started, e.g. if I do curl -X GET http://localhost:9998/tika
I simply get nothing (I don't get any response, servers seems not to send any response).
However, I would suggest to try to look at the request sent by curl, as curl sets some headers automatically and I also experienced similar problems before (i.e. for some servers not setting some obscure headers seems to be fatal...)
Cheers, Dirk
On 05/01/14 15:00, Florent Georges wrote:
On 5 January 2014 00:57, Andy Bunce wrote:
Hi,
> curl -X PUT -T aa.pdf http://localhost:9998/tika > [...] > I have tried: > let $file:="C:\tmp\aa.pdf" > let $request := > <http:request method='PUT' > > <http:body media-type="application/octet-stream">{ > fetch:binary($file) > }</http:body> > </http:request>
I do not know Tika, I do not have BaseX on this machine, and you did not give a lot of details about what is not working nor error messages, so it is a bit difficult to help here. All I can say is that I would use the following as the EXPath HTTP Client equivalent to the above CURL command:
<http:request method="put"> <http:body media-type="application/pdf"
src="file:/c:/tmp/aa.pdf"/> </http:request>
The @media-type is mandatory. You do not set any explicitly with CURL, so you should probably find which MIME type works with CURL in the first place. The @src lets the processor handle the details of accessing the binary file, which makes things easier and then you are sure the problem is not with fetch:binary() or with the analysis of the binary content of http:body.
If you find a MIME type that works with CURL (you can use the -H option like the following: -H "Content-Type: application/pdf"), and it is still failing, tcpdump can help as well. Open a terminal window, and execute the following:
sudo tcpdump -s 0 -A -i any tcp and host localhost and port 9998
This will dump all traffic to localhost:9998. Then go to another terminal window (because tcpdump is still running) and execute the CURL command. After the completion, go back to the first window and press Ctrl-C (to kill tcpdump). In between, tcpdump has output to the console a dump of the request. It will as well if you keep it running when you test your query in BaseX. So you can compare both requests and see what is different (or post it here so we can see what is happening).
Regards,
-- Dirk Kirsten, BaseX GmbH, http://basex.org |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22 _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk