Hello Klāvs,
just to clarify, I wasn't offended of your usage of the work "broken", I just wanted to point out that it is an incorrect characterization of the problem. We welcome criticism and ideas how to enhance XQuery/BaseX. After all, the whole async stuff also just happened because of user contributions.
I also see your problem with using async:fork-join() in this example. A nice solution would indeed be the async keyword. I don't think it is such a big issue if we introduced a new non-standard keyword, we also have done so in the past (especially the "updated" keyword which I personally don't want to miss).
Given that fetch is a BaseX module we could certainly introduce a new function within the module. The problem with your proposal is that in this case it wouldn't do what I guess you want it to do. It would iterate over $i sequentially and then execute the fetch ten times in a sequential manner. Just then it could be parallelized, so basically you would do a parallalelization of one. The parallelization has to be defined where the number of splits is defined, which does happen with the async keyword. If your function would be implemented it would have to look something like this:
fetch:text-with-implicit-parallel-execution-characteristics(for $i in 1 to 10 return "http://example.com/magic-story/for-number/" || $i)
I.e. it would have a sequence as a parameter and if the function is executed it will get ten strings as input and can then intelligently parallelize the work. However, if you pass in just one single value, there is no way to parallelize much.
Regarding the async keyword one thought (especially @Christian): Maybe it would be possible instead of introducing a keyword to introduce a annotation %async, which could be passed to a variable of a flwor expression, e.g. simply using for %async $i in 1 to 10
This should be valid XQuery as far as I can see, so we would still conform to the XQuery standard and provide the same feature. It could even be used for more use cases, outside of flwor expression, e.g.
On 03/02/2016 05:21 PM, Klāvs Priedītis wrote:
Hi,
Thanks Dirk and Christian! My apologies for using such a strong word to describe such a minor issue. It is not my intention to underestimate your work.
My main concern is that `async:fork-join()` is too heavy. I mean, in essence, I am solving the same problem in these two examples.
for $i in 1 to 10 return fetch:text("http://example.com/magic-story/for-number/" || $i)
async:fork-join( for $i in 1 to 10 return function() {fetch:text("http://example.com/magic-story/for-number/" || $i)} )
If you solve the same problem in Java 8 using Streams API you get the following two solutions.
rangeFrom1To10AsList.stream().map(fetchText).collect(...);
rangeFrom1To10AsList.parallelStream().map(fetchText).collect(...);
Considering these two Java 8 solutions, I guess what I actually want is the `async` keyword. So that the second Java example could be rewritten as follows.
for async $i in 1 to 10 return fetch:text("http://example.com/magic-story/for-number/" || $i)
Problem here is that you need to introduce a new keyword to the language. I am just trying to figure out if there could be a solution like this.
for $i in 1 to 10 return fetch:text-with-implicit-parallel-execution-characteristics("http://example.com/magic-story/for-number/" || $i)
Thanks for your input! Currently I lack in-depth knowledge on XQuery, I'll try to get a spare time to study the XQuery specification so that I am able reason about the various consequences.
2016-03-02 11:36 GMT+02:00 Christian Grün <christian.gruen@gmail.com mailto:christian.gruen@gmail.com>:
Hi Klāvs, Dirk already gave some good reasons why implicit parallelization is not the default in BaseX, although it may seems to be the most natural and desirable choice. Indeed it’s fairly easy to create FLWOR expressions which are slower when executed in parallel. The XQuery specification, by the way, does not say anything about parallelization, so it’s completely up to the implementation to choose the most promising evaluation strategy. You asked if it was not possible to introduce sth. like a `can-be-optimized-for-parallel-execution` flag, and you may be interested to hear that we had some first thoughts on an 'async' keyword, which could be used in FLWOR expressions [1]. In the GitHub issue, only the most basic cases have been considered so far, but if we manage to push this further, we would have a syntax that will be more easily accessible to most users. Ideally, some time in future, we could then apply this pattern automatically if it will be absolutely clear that parallelization leads to faster results. Everyone: If you notice that the async:fork-join function makes your code faster, don’t hesitate and send us your queries. Hope this helps, Christian [1] https://github.com/james-jw/xq-promise/issues/15 On Wed, Mar 2, 2016 at 10:05 AM, Dirk Kirsten <dk@basex.org <mailto:dk@basex.org>> wrote: > Labrīt Klāvs, > > well, I would the word "broken" is a strong word for something, which > still works correctly and returns the correct result. It might not be > perform optimal, but broken is something different. Also, given that the > feature is still not even one day old, you might have to be a bit more > patient with us in applying optimization techniques. > > The problem here has several dimensions: > - XQuery returns results in order. So in order to not break BaseX we > would have to cache the results of the fetch and return it in the > correct order > - This caching might be much more complex, if it is a more complicated > expression > - It might not always be desirable to turn on parallelisation. In this > case you call an external HTTP server 10 times, in this case there might > be a speedup. But using fetch:text() could also point to a local URI and > to parallelize I/O is not always beneficial. > > So given the many different use cases here I don't think an implicit > parallelization will work here to the satisfaction of everyone. > Also, I invite you to read this closely related question about Haskell > on SO and the linked research: > https://stackoverflow.com/questions/15005670/why-is-there-no-implicit-parallelism-in-haskell > > Cheers > Dirk > > On 03/02/2016 01:41 AM, Klāvs Priedītis wrote: >> Hello, BaseX team and community! >> >> Disclosure: I am not yet an expert on these issues and I have not >> fully studied the specs of XQuery. >> >> Now that Async Module is incorporated, I am wondering about some basic >> properties of xquery. >> >> Example: >> ```xquery >> for $i in 1 to 10 >> return fetch:text("http://example.com/magic-story/for-number/" || $i) >> ``` >> >> It feels to me that the xquery processor is broken if it cannot figure >> out on its own that the fetch can be done in parallel. >> Why does this not work out of the box? Can it work out of the box? >> Wouldn't it be able to do this in parallel if a flag >> `can-be-optimized-for-parallel-execution` was defined for the >> `fetch:text`? >> >> -- >> Veiksmi vēlot, >> Klāvs Priedītis > > -- > Dirk Kirsten, BaseX GmbH, http://basexgmbh.de > |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz > |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: > | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle > `-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22 >
-- Veiksmi vēlot, Klāvs Priedītis