fork-join and recursive function - BaseX-Talk - mailman.uni-konstanz.de

15 Jul 2016


      Hi All,
I have hierarchical information encoded an XML document that looks something like:
  <organization entityID=“1”>
  <organization entityID=“2">
     <parent entityID=“1”>
  </organization>
 <organization entityID=“3">
     <parent entityID=“2”>
  </organization>
 <organization entityID=“4">
     <parent entityID=“1”>
  </organization>
There are around 80,000 entries like this and I need to regular do extractions of sub-hiearchies (see the commented out version of the below function).
The commented-out version runs in about a minute using a modest amount of memory.
Hoping to take advantage of this https://github.com/BaseXdb/basex/commit/ac86bbbc3cd1f71461ce94d803cab46f21e7eae7, I modified the function (the uncommented version) which tries to use xquery:fork-join in 8.5.2beta (the July 12th snapshot).
The parallelized one chews up the 3g of available memory,  unceremoniously throws exceptions (Exception in thread "qtp198198276-19” ), with the occasional:
 java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.addConditionWaiter(AbstractQueuedSynchronizer.java:1857)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2073)
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "qtp198198276-19"
java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "qtp198198276-14" java.lang.OutOfMemoryError: GC overhead limit exceeded
and runs for tens of minutes (perhaps more - I always kill the process).
Any ideas on what I can do to improve the situation?
Thanks in advance.
Cheers,
-carl
declare function csd_bl:get_child_orgs($orgs,$org) {
  let $org_id := $org/@entityID
return
    if (functx:all-whitespace($org_id))
    then ()
    else
      let $c_orgs := $orgs[./parent[@entityID = $org_id]]
      let $t0 := trace($org_id, "creating func for ")
      let $t1 := trace(count($c_orgs), " func checks children: ")
      let $c_org_funcs:=
        for $c_org in $c_orgs
        return function() {      ( trace($org_id, "executing child func for ") , $c_org, csd_bl:get_child_orgs($orgs,$c_org))}
      return xquery:fork-join($c_org_funcs)
(:                                                                                                                                                                                   
  let $c_orgs :=                                                                                                                                                                     
    if (functx:all-whitespace($org_id))                                                                                                                                              
    then ()                                                                                                                                                                          
    else $orgs[./parent[@entityID = $org_id]]                                                                                                                                        
  return                                                                                                                                                                             
    for $c_org in $c_orgs                                                                                                                                                            
    let $t0 := trace($org_id, "processing children for ")                                                                                                                            
    return ($c_org,csd_bl:get_child_orgs($orgs,$c_org))                                                                                                                              
:)
};