Hi Andy,
- just a quick report, as I wasn't able to solve the problem so far.
This working using curl as the client
curl -X PUT -T aa.pdf http://localhost:9998/tika
If I add '--header "Content-Type: application/pdf" ' it works fine for me,
too. If I don't specify the content-type I get a "415: Unsupported Media Type". Just for others as a note ...
If I run the following:
let $file:="some.pdf", $request := <http:request method='PUT'> <http:body media-type="application/octet-stream">{ fetch:binary($file) }</http:body> </http:request> return http:send-request($request,"http://localhost:9998/tika")
I get from BaseX (running in debug mode):
*java.lang.IllegalArgumentException: object is not an instance of declaring class*
and (from Tika):
*INFO: tika (autodetecting type)*
Looks like there's already going something wrong on BaseX level. I still get a response from Tika, but not the one I expected. If I change the media-type to 'application/pdf' I no longer get the BaseX error, but a document processing error (500) from Tika. 'application/pdf' is also the media type that 'fetch:content-type()' returns..
So if it's not further specified, Tika tries to guess the content type but cannot find one. If it's specified it returns a processing error. Like you said maybe a problem with the content (as the content-length headers differ).
Sorry for not being of much help but maybe someone else has an idea?
Cheers, Lukas