Hi Christian,
Raising this general thread of conversation again. I seem to be running into some weird issues with recursion and fork-join(). I am getting some non-deterministic behavior running the hierarchy extractor function below. There are no errors of note, but I think maybe some of the threads are silently failing.
Unfortunately I don’t have a SSCCE yet as it is a bit difficult to reproduce. I am wondering if you might have any suggestions on how to debug this and if you are logging any information in case a thread dies. We are running 8.5.2.
Cheers, -carl
On Jul 16, 2016, at 10:49 AM, Carl Leitner litlfred@ibiblio.org wrote:
Hi Christian, Thanks for your help and hints. Things are definitely working now.
Also thanks for the hint about loading the document into memory with update {}. Without that, the fork-join had significantly degraded performance as compared to the non-fork join.
On an further optimization note, I found the overhead of a fork-join was only worthwhile for those cases where there were at least three children of the node we are extracting at.
One last question, do you have an expected timeline for when 8.5.2 will be released?
Again, thanks for your help.
Cheers, -carl
On Jul 16, 2016, at 2:57 AM, Christian Grün christian.gruen@gmail.com wrote:
Hi Carl,
Thanks again for your observation! Only now, I noticed that xquery:fork-join did weird things when specifying an empty sequence as argument.
This has been fixed [1]; without the latest snapshot [2], you can get rid of the "where $c_orgs" clause.
I tried your original fork query, and it now terminates (it should generate the same result as the unparallelized query after removing the trace() within the function, or replacing it with prof:dump).
Hope this helps, Christian
[1] https://github.com/BaseXdb/basex/commit/f7d8744e2760ed1531b48a9c9de92f3694e6... [2] http://files.basex.org/releases/latest
On Sat, Jul 16, 2016 at 8:38 AM, Christian Grün christian.gruen@gmail.com wrote:
It seems that $c_orgs as defined in line 03 will be a single element.
Exactly. Maybe it’s sufficient to rewrite line 3 as follows:
let $c_orgs := $orgs[parent/@id = $org_id] where $c_orgs
On Sat, Jul 16, 2016 at 1:16 AM, Carl Leitner litlfred@ibiblio.org wrote:
Hmm. Looking at your query: 01: declare function extractor:get_child_orgs-forked($orgs,$org) { 02: for $org_id in $org/@id 03: for $c_orgs in $orgs[parent/@id = $org_id] 04: return xquery:fork-join( 05: for $c_org in $c_orgs 06: return function() { 07: $c_org, extractor:get_child_orgs-forked($orgs, $c_org) 08: } 09: ) 10: };
This means that when you get to line 05 we are looping over a single element and the enclosing fork-join will only be joining a single function/thread. Am I misreading that?
Cheers, -carl
On Jul 15, 2016, at 5:32 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Carl,
Running your version of the query does not exhaust memory as mine had, however I don’t see the CPU usage using more than one slightly more than available processor.
Your initial fork query was creating an endless loop. In the current one, I assume that only one function will be created for each xquery:fork-join call. Maybe you need to spend some more time on the question how your code could actually be forked in a recursive way at all?
What you probably want is a xquery:fork() function (without join), but we didn’t add such a function so far because it would be much more difficult to eventually join the results and find a good order.
Cheers Christian