Hello everyone,
I am using BaseX 8.44 and the REST XQ interface (ie, http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when invoked with GET, it does a full text search (using "$db-nodes[text() contains text { $term } all]"), gets the results, constructs a JSON response and sends it back.
That's all fine and works great. However, I am not sure how I should be doing the queries I describe bellow.
_Note: the query is initiated by a SPA javascript client, thus when I say encode/uri-escape, what I mean is that I invoke the encodeURIComponent function (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Obj...). _Note 2: for the sake of conversation let's consider the example endpoint declared as:
%rest:GET %rest:path("/search/{$term}")
1. I want to search for "tea". That is the basic query. A single term, no problem.
curl -s "https://example.com/search/tea"
2. I want to search for "tea time". Now, this query has a space in between the two words. What I expect to get back, is any node that contains both words (thus I have used "contains text" with "all"), even if they may be a few words apart. - Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"? - Or, should I be replacing the space with "+", ie "tea+time"? - Or, some other advice?
curl -s "https://example.com/search/tea%20time" curl -s "https://example.com/search/tea+time"
3. I want to search for "tea/time". This is even trickier. What I expect to get back, is any node that contains "tea/time", ie a search result for a single term. How do I do this? - If I do not do anything, the slash is treated as part of the URL, thus not matching a route. - If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I invoke the endpoint I get the same as if there was a slash. - I am not sure how I should deal with the slash. How should I escape/encode this?
curl -s "https://example.com/search/tea/time" curl -s "https://example.com/search/tea%2Ftime"
Thank you,
Hi Ivan,
A more common approach is to supply search terms as query parameters (URL?query=...); in that case, your path won’t have new segments. If you prefer paths, you can use a regular expression in your RESTXQ path pattern [1]:
"search/{$query=.+}"
In both cases, encodeURIComponent should be the appropriate function to encode special characters.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/RESTXQ#Paths
On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis ivan.kanak+basex-talk@gmail.com wrote:
Hello everyone,
I am using BaseX 8.44 and the REST XQ interface (ie, http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when invoked with GET, it does a full text search (using "$db-nodes[text() contains text { $term } all]"), gets the results, constructs a JSON response and sends it back.
That's all fine and works great. However, I am not sure how I should be doing the queries I describe bellow.
_Note: the query is initiated by a SPA javascript client, thus when I say encode/uri-escape, what I mean is that I invoke the encodeURIComponent function (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Obj...). _Note 2: for the sake of conversation let's consider the example endpoint declared as:
%rest:GET %rest:path("/search/{$term}")
- I want to search for "tea". That is the basic query. A single term,
no problem.
curl -s "https://example.com/search/tea"
- I want to search for "tea time". Now, this query has a space in
between the two words. What I expect to get back, is any node that contains both words (thus I have used "contains text" with "all"), even if they may be a few words apart.
Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"?
Or, should I be replacing the space with "+", ie "tea+time"?
Or, some other advice?
curl -s "https://example.com/search/tea%20time" curl -s "https://example.com/search/tea+time"
- I want to search for "tea/time". This is even trickier. What I
expect to get back, is any node that contains "tea/time", ie a search result for a single term. How do I do this?
- If I do not do anything, the slash is treated as part of the URL,
thus not matching a route.
- If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I
invoke the endpoint I get the same as if there was a slash.
- I am not sure how I should deal with the slash. How should I
escape/encode this?
curl -s "https://example.com/search/tea/time" curl -s "https://example.com/search/tea%2Ftime"
Thank you,
Hi Christian,
thanks for the quick reply. It definitely helps, but it still keeps this behaviour in the "weird" domain. I do not see a reason to be decoding the URI before it gets to match a route. What is the reason for this?
What you propose works, but if I have a route like "/search/{$query=.+}/page/{$page}", then the query will match everything including "/page/...". If the path was not decoded, I do not think I would need the regex, neither any other special operation on the route. It should work with "/search/{$query}/page/{$page}" and it should return "tea%2Ftime". Why do I have to make workarounds to try to guess how a part of the URL was encoded, when the URL I hit has that part encoded? I don't think it makes sense, and I don't see a use case for this.
When the framework receives the payload, it is responsible to match a route. By matching the route, it will provide me with the binded parts of the route that I requested. Then, *I* am responsible to decode those parts as I see fit and handle the request as I need.
If the framework decodes the URL before matching a route, that is a problem to me - I do not have the control I need. If the framework decodes the URL parts before binding the route variables, this is fine - it saves me an operation.
While, I now refactored the endpoint handlers to work with query params, and this is no longer a problem for me, it is a problem in general.
Cheers,
On Mon, 20 Jan 2020 at 19:36, Christian Grün christian.gruen@gmail.com wrote:
Hi Ivan,
A more common approach is to supply search terms as query parameters (URL?query=...); in that case, your path won’t have new segments. If you prefer paths, you can use a regular expression in your RESTXQ path pattern [1]:
"search/{$query=.+}"
In both cases, encodeURIComponent should be the appropriate function to encode special characters.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/RESTXQ#Paths
On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis ivan.kanak+basex-talk@gmail.com wrote:
Hello everyone,
I am using BaseX 8.44 and the REST XQ interface (ie, http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when invoked with GET, it does a full text search (using "$db-nodes[text() contains text { $term } all]"), gets the results, constructs a JSON response and sends it back.
That's all fine and works great. However, I am not sure how I should be doing the queries I describe bellow.
_Note: the query is initiated by a SPA javascript client, thus when I say encode/uri-escape, what I mean is that I invoke the encodeURIComponent function (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Obj...). _Note 2: for the sake of conversation let's consider the example endpoint declared as:
%rest:GET %rest:path("/search/{$term}")
- I want to search for "tea". That is the basic query. A single term,
no problem.
curl -s "https://example.com/search/tea"
- I want to search for "tea time". Now, this query has a space in
between the two words. What I expect to get back, is any node that contains both words (thus I have used "contains text" with "all"), even if they may be a few words apart.
Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"?
Or, should I be replacing the space with "+", ie "tea+time"?
Or, some other advice?
curl -s "https://example.com/search/tea%20time" curl -s "https://example.com/search/tea+time"
- I want to search for "tea/time". This is even trickier. What I
expect to get back, is any node that contains "tea/time", ie a search result for a single term. How do I do this?
- If I do not do anything, the slash is treated as part of the URL,
thus not matching a route.
- If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I
invoke the endpoint I get the same as if there was a slash.
- I am not sure how I should deal with the slash. How should I
escape/encode this?
curl -s "https://example.com/search/tea/time" curl -s "https://example.com/search/tea%2Ftime"
Thank you,
While moving the URI parameter to the query string seems like an acceptable workaround, I, too, suggest that if *reserved* URI characters such as '/' appear percent-encoded, they should not be converted to their decoded character prior to analyzing the URI, in line with Sect. 2.2 of RFC 3986 [1].
If I enter an escaped colon (%3A) in a path segment, it will be kept as %3A by BaseX, rather than converted to the reserved character ':'.
The RESTXQ specification [2] doesn’t seem to contain detailed instructions on how to decode the submitted URI before extracting path parameters, therefore I think RFC 3986 should prevail.
So I agree, BaseX should not interpret escaped slashes as if they were regular slashes, thereby disallowing them as part of RESTXQ path pa
Gerrit
[1] https://tools.ietf.org/html/rfc3986#section-2.2 [2] http://exquery.github.io/exquery/exquery-restxq-specification/restxq-1.0-spe...
On 24.01.2020 13:54, Ivan Kanakarakis wrote:
Hi Christian,
thanks for the quick reply. It definitely helps, but it still keeps this behaviour in the "weird" domain. I do not see a reason to be decoding the URI before it gets to match a route. What is the reason for this?
What you propose works, but if I have a route like "/search/{$query=.+}/page/{$page}", then the query will match everything including "/page/...". If the path was not decoded, I do not think I would need the regex, neither any other special operation on the route. It should work with "/search/{$query}/page/{$page}" and it should return "tea%2Ftime". Why do I have to make workarounds to try to guess how a part of the URL was encoded, when the URL I hit has that part encoded? I don't think it makes sense, and I don't see a use case for this.
When the framework receives the payload, it is responsible to match a route. By matching the route, it will provide me with the binded parts of the route that I requested. Then, *I* am responsible to decode those parts as I see fit and handle the request as I need.
If the framework decodes the URL before matching a route, that is a problem to me - I do not have the control I need. If the framework decodes the URL parts before binding the route variables, this is fine - it saves me an operation.
While, I now refactored the endpoint handlers to work with query params, and this is no longer a problem for me, it is a problem in general.
Cheers,
On Mon, 20 Jan 2020 at 19:36, Christian Grün christian.gruen@gmail.com wrote:
Hi Ivan,
A more common approach is to supply search terms as query parameters (URL?query=...); in that case, your path won’t have new segments. If you prefer paths, you can use a regular expression in your RESTXQ path pattern [1]:
"search/{$query=.+}"
In both cases, encodeURIComponent should be the appropriate function to encode special characters.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/RESTXQ#Paths
On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis ivan.kanak+basex-talk@gmail.com wrote:
Hello everyone,
I am using BaseX 8.44 and the REST XQ interface (ie, http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when invoked with GET, it does a full text search (using "$db-nodes[text() contains text { $term } all]"), gets the results, constructs a JSON response and sends it back.
That's all fine and works great. However, I am not sure how I should be doing the queries I describe bellow.
_Note: the query is initiated by a SPA javascript client, thus when I say encode/uri-escape, what I mean is that I invoke the encodeURIComponent function (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Obj...). _Note 2: for the sake of conversation let's consider the example endpoint declared as:
%rest:GET %rest:path("/search/{$term}")
- I want to search for "tea". That is the basic query. A single term,
no problem.
curl -s "https://example.com/search/tea"
- I want to search for "tea time". Now, this query has a space in
between the two words. What I expect to get back, is any node that contains both words (thus I have used "contains text" with "all"), even if they may be a few words apart.
Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"?
Or, should I be replacing the space with "+", ie "tea+time"?
Or, some other advice?
curl -s "https://example.com/search/tea%20time" curl -s "https://example.com/search/tea+time"
- I want to search for "tea/time". This is even trickier. What I
expect to get back, is any node that contains "tea/time", ie a search result for a single term. How do I do this?
- If I do not do anything, the slash is treated as part of the URL,
thus not matching a route.
- If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I
invoke the endpoint I get the same as if there was a slash.
- I am not sure how I should deal with the slash. How should I
escape/encode this?
curl -s "https://example.com/search/tea/time" curl -s "https://example.com/search/tea%2Ftime"
Thank you,
Hi Ivan, hi Gerrit,
Thanks for your assessments.
Most design decisions in RESTXQ have been taken from Java’s JAX-RS API [1]. The semantics for accessing paths is a bit more complex, though: JAX-RS provides two annotations @Path and @PathParam to access the full path and segments of the path, and the segments are automatically decoded. Automatic decoding can be disabled via an optional @Encoded annotation.
In RESTXQ, we only have a single %rest:path annotations, which contains both the full path as well as variables for path segments.
Requests with wrongly encoded URLs, such as http://localhost:8984/a%2, are already rejected by Jetty (and, I guess, any other web servers). They are rejected before any RESTXQ code can intervene. If a URLs is correctly encoded, the Java servlet function getPathInfo() is used to obtain the path. I noticed there is an alternative function getRequestURI() that could be used to access the original URL.
Maybe the introduction of a %rest:encoded annotation could be discussed in the EXQuery/RESTXQ repository [2]?
Best, Christian
[1] https://download.oracle.com/otndocs/jcp/jaxrs-2_0-fr-eval-spec/index.html [2] https://github.com/exquery/exquery/issues
On Fri, Jan 24, 2020 at 2:38 PM Imsieke, Gerrit, le-tex gerrit.imsieke@le-tex.de wrote:
On 24.01.2020 14:36, Imsieke, Gerrit, le-tex wrote:
So I agree, BaseX should not interpret escaped slashes as if they were regular slashes, thereby disallowing them as part of RESTXQ path pa
…rameters.
basex-talk@mailman.uni-konstanz.de