Hi Andreas, Christian,
Here attached is a module that I wrote a while ago to limit the rate of requests sent to a web server. This module has been useful in accessing APIs where the SLA does not allow more than a certain number requests per minute, and might be useful for this web crawling scenario. Although Cristian's crawler module already has a sleep built in to it.
Cheers, Vincent
-----Original Message----- From: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de On Behalf Of Christian Grün Sent: Wednesday, August 01, 2018 3:57 AM To: Andreas Mixich mixich.andreas@gmail.com Cc: BaseX basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Wasn't there a function, that would walk a website?
Hi Andreas,
Just for fun, I wrote a little crawler in XQuery (see the attached files).
Please note that it’s just a stub; and it should surely be used decently, otherwise the remote server might block further access.
Cheers, Christian
On Wed, Aug 1, 2018 at 8:08 AM Andreas Mixich mixich.andreas@gmail.com wrote:
Am 31.07.2018 um 08:51 schrieb Christian Grün:
I guess you were dreaming ;) But it should definitely be possible to realize this in XQuery without too many lines of code..
Ok, then that's what I am going to do. Thanks for clarification.
-- Goody Bye, Minden jót, Mit freundlichen Grüßen, Andreas Mixich