Hi Andreas,
What about https://about.validator.nu/htmlparser/ ?
Thanks for the pointer; I will have a look at this parser.
This creates a mess:
let $url :=
"https://bartoszmilewski.com/2014/10/28/category-theory-for-programmers-the-p..." let $response := http:send-request(<http:request method='get'/>, $url) return html:parse($response[2])
The reason is that the HTTP response is of type node(). html:parse takes strings as arguments, and by calling html:parse, your node will be implicitly converted to an atomized string.
The following query should do the job:
let $url := "https://bartoszmilewski.com/2014/10/28/category-theory-for-programmers-the-p..." let $response := http:send-request( <http:request method='get' override-media-type='text/plain' href='{ $url }'/> )[2] return html:parse( $response, map { 'nons': false() } )
By the way, while running your queries, I noticed that html:parse didn’t accept binary input anymore. This has been fixed in the latest snapshot [1].
Apart from that, we currently work on an enhanced version of our HTTP Client Module (see [2]). Maybe we’ll drop the implicit response conversion in the new functions.
Cheers, Christian
[1] http://files.basex.org/releases/latest/ [2] https://github.com/BaseXdb/basex/issues/914