Hi,
a REST API returns data in JSON, which I have requested via the http-client's `http:send-request#1` function with the following code:
1. let $data := local:get-article($token, concat($local:ahost, $link))/json 2. let $response := json:serialize($data, map{'format':'xquery'}) 3. => json:parse(map{'format':'xquery'}) 4. let $child := $response(2)?data?children(1)?data 5. return $child?body_html
BaseX returns the JSON XML-encoded (see line 1), so I serialize once into JSON (line 2), then to XQuery Map (line 3). In line 4 I return data, that can look like this:
<div class="md"><p>Welcome everyone from <a href="/r/all">r/all</a>! Please remember:</p>
In the further process I would like this to become proper XHTML.
I tried all the different `fn:serialize#2` parameters. Either the data stays the same or it gets re-entitized. I also tried `html:parse#1` but get the error:
[html:parse] Line 1: No text allowed before root element.
I also tried casting to `xs:normalizedString` or using `fn:normalize-space#1` to no avail, before the `html:parse#1`
As last I tried
document { $child?body_html }
which did not change anything in the output.
If I do this, however:
let $data := ' <div class="md"><p>Welcome everyone from <a href="/r/all">r/all</a>! Please remember:</p>' return $data
I get this:
<div class="md"><p>Welcome everyone from <a href="/r/all">r/all</a>! Please remember:</p>
So my question is: How can I serialize this into XHTML (de-entitze it)? Thanks.
Hi Andreas,
Before we fix the outcome, we should have a look at the initial HTTP response. I assume the following function invokes http:send-request?
1. let $data := local:get-article($token, concat($local:ahost,
$link))/json
Could you supply us with the full HTTP response (header and body)?
Do you have the REST API under control?
Best, Christian
On Mon, Aug 17, 2020 at 4:43 AM Andreas Mixich mixich.andreas@gmail.com wrote:
Hi,
a REST API returns data in JSON, which I have requested via the http-client's `http:send-request#1` function with the following code:
1. let $data := local:get-article($token, concat($local:ahost,
$link))/json 2. let $response := json:serialize($data, map{'format':'xquery'}) 3. => json:parse(map{'format':'xquery'}) 4. let $child := $response(2)?data?children(1)?data 5. return $child?body_html
BaseX returns the JSON XML-encoded (see line 1), so I serialize once into JSON (line 2), then to XQuery Map (line 3). In line 4 I return data, that can look like this:
<div class="md"><p>Welcome everyone from <a
href="/r/all">r/all</a>! Please remember:</p>
In the further process I would like this to become proper XHTML.
I tried all the different `fn:serialize#2` parameters. Either the data stays the same or it gets re-entitized. I also tried `html:parse#1` but get the error:
[html:parse] Line 1: No text allowed before root element.
I also tried casting to `xs:normalizedString` or using `fn:normalize-space#1` to no avail, before the `html:parse#1`
As last I tried
document { $child?body_html }
which did not change anything in the output.
If I do this, however:
let $data := ' <div class="md"><p>Welcome everyone from <a
href="/r/all">r/all</a>! Please remember:</p>' return $data
I get this:
<div class="md"><p>Welcome everyone from <a href="/r/all">r/all</a>!
Please remember:</p>
So my question is: How can I serialize this into XHTML (de-entitze it)? Thanks.
-- Goody Bye, Minden jót, Mit freundlichen Grüßen, Andreas Mixich
Am 17.08.2020 um 09:25 schrieb Christian Grün:
Hi Andreas,
Before we fix the outcome, we should have a look at the initial HTTP response. I assume the following function invokes http:send-request?
1. let $data := local:get-article($token, concat($local:ahost,
$link))/json
True. It's just a simple wrapper around `http:send-request#1`, that takes care of setting up the `http:request/` element and issues it, returning the `http:response/`:
declare function local:get-article( $token as element(json), $link as xs:string) { let $request := <http:request href="{$link}" method="get"> <http:header name="Authorization" value="{$token/token__type/data() || ' ' ||$token/access__token/data()}"/> <http:header name="User-Agent" value="{$local:user-agent}"/> </http:request> return http:send-request($request) };
Could you supply us with the full HTTP response (header and body)?
Sure. I am going to send it privately to you, since the cookie data may contain secrets. Note, that the data in question can be found in a JSON-key named `body_html`.
Do you have the REST API under control?
No. It is the Reddit API, documented here[1]. I use it according to [2].
[1]: https://www.reddit.com/dev/api/ [2]: https://github.com/reddit-archive/reddit/wiki/OAuth2-Quick-Start-Example
Hi Andreas,
I was surprised indeeed to see entities escaped in the original response. In the Reddit documentation, I found this:
response body encoding For legacy reasons, all JSON response bodies currently have <, >, and & replaced with <, >, and &, respectively. If you wish to opt out of this behaviour, add a raw_json=1 parameter to your request.
You could try two things:
1. specify "...?raw_json=1" in your URL, 2. unescape the three entities in a subsequent step:
let $data := <body__html>&lt;div class="md"&gt;&lt;p&gt;Really cool. Wish I had your talent!&lt;/p&gt; &lt;/div&gt;</body__html> let $unescaped := $data => replace(``[<]``, ``[<]``) => replace(``[>]``, ``[>]``) => replace(``[&]``, ``[&]``) let $xml := parse-xml($unescaped) return $xml
Hope this helps, Christian
On Mon, Aug 17, 2020 at 9:07 PM Andreas Mixich mixich.andreas@gmail.com wrote:
Am 17.08.2020 um 09:25 schrieb Christian Grün:
Hi Andreas,
Before we fix the outcome, we should have a look at the initial HTTP response. I assume the following function invokes http:send-request?
1. let $data := local:get-article($token, concat($local:ahost,
$link))/json
True. It's just a simple wrapper around `http:send-request#1`, that takes care of setting up the `http:request/` element and issues it, returning the `http:response/`:
declare function local:get-article( $token as element(json), $link as xs:string) { let $request := <http:request href="{$link}" method="get"> <http:header name="Authorization"
value="{$token/token__type/data() || ' ' ||$token/access__token/data()}"/> <http:header name="User-Agent" value="{$local:user-agent}"/> </http:request> return http:send-request($request) };
Could you supply us with the full HTTP response (header and body)?
Sure. I am going to send it privately to you, since the cookie data may contain secrets. Note, that the data in question can be found in a JSON-key named `body_html`.
Do you have the REST API under control?
No. It is the Reddit API, documented here[1]. I use it according to [2].
-- Goody Bye, Minden jót, Mit freundlichen Grüßen, Andreas Mixich
Am 18.08.2020 um 13:29 schrieb Christian Grün:
response. In the Reddit documentation, I found this:
response body encoding For legacy reasons, all JSON response bodies currently have <, >, and & replaced with <, >, and &, respectively. If you wish to opt out of this behaviour, add a raw_json=1 parameter to your request.
Yes, I was aware of that, but I assumed, `fn:serialize` would take care of this. It seems I have to revisit https://www.w3.org/TR/xslt-xquery-serialization-31/
You could try two things:
- specify "...?raw_json=1" in your URL,
- unescape the three entities in a subsequent step:
I have chosen the second one, as per your example. Nice use of the string constructor, btw.
Thanks a lot!
basex-talk@mailman.uni-konstanz.de