Hi Dirk, The Tika documentation is not very clear[1]. tika-app has a simple server mode. tika-server, which I am using, is a different jar [2]
[1] http://stackoverflow.com/questions/12231630/how-to-use-tika-in-server-mode [2] http://mvnrepository.com/artifact/org.apache.tika/tika-server/1.4
On Sun, Jan 5, 2014 at 3:39 PM, Dirk Kirsten dk@basex.org wrote:
Hello,
You can also simple get all the request headers using the -v flag when running curl. Or you could use wireshark, which (at least to me) seems easier than using tcpdump.
I'd like to reproduce your problem, but I seem to be too stupid to get the Tika server up and running. When running java -jar tika-app-1.4.jar -s 9999
(or even with the verbose flag) I simply don't get any thing (but a running process) and the server seems to me not properly started, e.g. if I do curl -X GET http://localhost:9998/tika
I simply get nothing (I don't get any response, servers seems not to send any response).
However, I would suggest to try to look at the request sent by curl, as curl sets some headers automatically and I also experienced similar problems before (i.e. for some servers not setting some obscure headers seems to be fatal...)
Cheers, Dirk
On 05/01/14 15:00, Florent Georges wrote:
On 5 January 2014 00:57, Andy Bunce wrote:
Hi,
curl -X PUT -T aa.pdf http://localhost:9998/tika [...] I have tried: let $file:="C:\tmp\aa.pdf" let $request := <http:request method='PUT' > <http:body media-type="application/octet-stream">{ fetch:binary($file) }</http:body> </http:request>
I do not know Tika, I do not have BaseX on this machine, and you did not give a lot of details about what is not working nor error messages, so it is a bit difficult to help here. All I can say is that I would use the following as the EXPath HTTP Client equivalent to the above CURL command:
<http:request method="put"> <http:body media-type="application/pdf"
src="file:/c:/tmp/aa.pdf"/>
</http:request>
The @media-type is mandatory. You do not set any explicitly with CURL, so you should probably find which MIME type works with CURL in the first place. The @src lets the processor handle the details of accessing the binary file, which makes things easier and then you are sure the problem is not with fetch:binary() or with the analysis of the binary content of http:body.
If you find a MIME type that works with CURL (you can use the -H option like the following: -H "Content-Type: application/pdf"), and it is still failing, tcpdump can help as well. Open a terminal window, and execute the following:
sudo tcpdump -s 0 -A -i any tcp and host localhost and port 9998
This will dump all traffic to localhost:9998. Then go to another terminal window (because tcpdump is still running) and execute the CURL command. After the completion, go back to the first window and press Ctrl-C (to kill tcpdump). In between, tcpdump has output to the console a dump of the request. It will as well if you keep it running when you test your query in BaseX. So you can compare both requests and see what is different (or post it here so we can see what is happening).
Regards,
-- Dirk Kirsten, BaseX GmbH, http://basex.org |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22 _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk