Hi Götz (cc @ basex-talk),
OK, I think I understand. However, I think there should be some
possibilities to allow the user to give hints. In my opinion, FOR-loops would be first-class candidates to use parallel streams, in particular in the use case I described in my previous posting:
FOR $var IN (collection) PARALLEL RETURN (expression-list)
Makes sense, in general.. XQuery pragmas could be solution:
(# basex: parallel #) { ... }
Higher-order functions provide functions like hof:parallel-map(...).
However, it has many effects on the architecture of BaseX in terms of performance, because we'd need to create new contexts for each parallelized query, which takes additional time. See the following query as example:
$x[. = "123"]
The dot applies to the "current context item". If we parallelize a query, we'd have multiple current context items. The same multiplication would apply to the stack frame and other runtime variables, and the time lost for duplicating these instances is in most cases more expensive than doing stuff in a single thread.
At least that's our experience so far. Once again, we are happy to see people jump into our code and show us that it can be done better..
Christian
OK. Let me do my stuff first. Then I will see if I'm able to dive deep enough into the BaseX code to come up with some meaningful contribution!
Kind regards,
Goetz
-----Ursprüngliche Nachricht----- Von: Christian Grün [mailto:christian.gruen@gmail.com] Gesendet: Mittwoch, 22. April 2015 11:15 An: Goetz Heller; BaseX Betreff: Re: Distributing queries to several on several processors
Hi Götz (cc @ basex-talk),
OK, I think I understand. However, I think there should be some possibilities to allow the user to give hints. In my opinion, FOR-loops would be first-class candidates to use parallel streams, in particular in the use case I described in my previous posting:
FOR $var IN (collection) PARALLEL RETURN (expression-list)
Makes sense, in general.. XQuery pragmas could be solution:
(# basex: parallel #) { ... }
Higher-order functions provide functions like hof:parallel-map(...).
However, it has many effects on the architecture of BaseX in terms of performance, because we'd need to create new contexts for each parallelized query, which takes additional time. See the following query as example:
$x[. = "123"]
The dot applies to the "current context item". If we parallelize a query, we'd have multiple current context items. The same multiplication would apply to the stack frame and other runtime variables, and the time lost for duplicating these instances is in most cases more expensive than doing stuff in a single thread.
At least that's our experience so far. Once again, we are happy to see people jump into our code and show us that it can be done better..
Christian
basex-talk@mailman.uni-konstanz.de