Question was about set operations in BaseX. I had to calculate set difference on a sequence of values. XQuery has operator except, but it works only on nodes. I guess normal code for this is using distinct-values:
declare function local:difference-pred($a, $b) { distinct-values($a[not(.=$b)]) };
I used it on big (thousands of values) sets, and it was slow. Then I tried on maps using BaseX map-module :
declare function local:difference-map($a, $b) { let $m1 := map:new(for $i in $a return map:entry($i, true())) let $m2 := map:new(for $i in $b return map:entry($i, false())) let $m3 := map:new(($m1, $m2)) return for $i in map:keys($m3) return if ($m3($i)) then $i else () };
Then we found another solution, with only one map:
declare function local:difference-map-2($a, $b) { let $m2 := map:new(for $i in $b return map:entry($i, true())) return for $i in $a return if($m2($i)) then () else $i };
When trying them at same sequences:
let $a :=for $i in (1 to 100000) return if (random:double() < 0.01) then () else string($i) let $b := for $i in $a return if (random:double() < 0.45) then () else $i
return (count($a), count($b), count(prof:time( local:difference-pred($a, $b), true(), 'pred ')), count(prof:time( local:difference-map($a, $b), true(), 'map ')), count(prof:time( local:difference-map($a, $b), true(), 'map2 ')))
This gives times (on BaseX 7.6 running on OpenSUSE on VMWare virtual machine on a PC).
pred: 80468.68ms map: 261.23 ms map2: 253.77 ms
That is: map2 is 317 times faster than distinct-values version. I did not measure memory usage. Also on different size sequences, each one of the functions can be fastest!
-- Arto Viitanen Finland