I’m running out of memory (1.5 GB allocated) when querying for duplicate node values over a fairly flat XML database of approximately 450 MB.

 

Can anyone suggest a more memory-efficient approach to framing this query than iterating over distinct-values as I do below?  I’m hoping that there are some Basex tips and tricks to help out here.

 

for $val in distinct-values(/dataset/item/pii)

let $cnt := count(/dataset/item/pii[. = $val])

return

  if ($cnt > 1) then

      <duplicate>{$val}</duplicate>

  else

    null

   

 

Thanks in advance,

Constantine

 



Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.