Hi Klāvs,
Dirk already gave some good reasons why implicit parallelization is
not the default in BaseX, although it may seems to be the most natural
and desirable choice. Indeed it’s fairly easy to create FLWOR
expressions which are slower when executed in parallel. The XQuery
specification, by the way, does not say anything about
parallelization, so it’s completely up to the implementation to choose
the most promising evaluation strategy.
You asked if it was not possible to introduce sth. like a
`can-be-optimized-for-parallel-execution` flag, and you may be
interested to hear that we had some first thoughts on an 'async'
keyword, which could be used in FLWOR expressions [1]. In the GitHub
issue, only the most basic cases have been considered so far, but if
we manage to push this further, we would have a syntax that will be
more easily accessible to most users. Ideally, some time in future, we
could then apply this pattern automatically if it will be absolutely
clear that parallelization leads to faster results.
Everyone: If you notice that the async:fork-join function makes your
code faster, don’t hesitate and send us your queries.
Hope this helps,
Christian
[1] https://github.com/james-jw/xq-promise/issues/15
On Wed, Mar 2, 2016 at 10:05 AM, Dirk Kirsten <dk@basex.org> wrote:
> Labrīt Klāvs,
>
> well, I would the word "broken" is a strong word for something, which
> still works correctly and returns the correct result. It might not be
> perform optimal, but broken is something different. Also, given that the
> feature is still not even one day old, you might have to be a bit more
> patient with us in applying optimization techniques.
>
> The problem here has several dimensions:
> - XQuery returns results in order. So in order to not break BaseX we
> would have to cache the results of the fetch and return it in the
> correct order
> - This caching might be much more complex, if it is a more complicated
> expression
> - It might not always be desirable to turn on parallelisation. In this
> case you call an external HTTP server 10 times, in this case there might
> be a speedup. But using fetch:text() could also point to a local URI and
> to parallelize I/O is not always beneficial.
>
> So given the many different use cases here I don't think an implicit
> parallelization will work here to the satisfaction of everyone.
> Also, I invite you to read this closely related question about Haskell
> on SO and the linked research:
> https://stackoverflow.com/questions/15005670/why-is-there-no-implicit-parallelism-in-haskell
>
> Cheers
> Dirk
>
> On 03/02/2016 01:41 AM, Klāvs Priedītis wrote:
>> Hello, BaseX team and community!
>>
>> Disclosure: I am not yet an expert on these issues and I have not
>> fully studied the specs of XQuery.
>>
>> Now that Async Module is incorporated, I am wondering about some basic
>> properties of xquery.
>>
>> Example:
>> ```xquery
>> for $i in 1 to 10
>> return fetch:text("http://example.com/magic-story/for-number/" || $i)
>> ```
>>
>> It feels to me that the xquery processor is broken if it cannot figure
>> out on its own that the fetch can be done in parallel.
>> Why does this not work out of the box? Can it work out of the box?
>> Wouldn't it be able to do this in parallel if a flag
>> `can-be-optimized-for-parallel-execution` was defined for the
>> `fetch:text`?
>>
>> --
>> Veiksmi vēlot,
>> Klāvs Priedītis
>
> --
> Dirk Kirsten, BaseX GmbH, http://basexgmbh.de
> |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
> |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
> | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
> `-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22
>