Dear JohnLeM,
thanks for your mail. As you already noted, XQuery is a functional language, and this is the reason why XQuery maps are not exactly comparable to maps and sets, as they are used in imperative languages:
All maps in XQuery are persisent (immutable): Once a map has been generated, it is not possible to change its contents. Instead, a new map is to be created by each insertion or deletion [1]. This sounds like a huge memory killer, but it’s not as bad as you might guess. Various efficient solutions exist for persistent map, such as the mapped trie that has been implemented in BaseX [2]. It will only create copies of parts of the data structure that are to be changed. The following query is an example for a query which creates 100.000 map with a single entry, and a large map containing all the entries; on my system, it requires 200 ms to run:
map:size(map:new( for $i in 1 to 100000 return map { $i := true() } ))
In short: persistent maps may not be as efficient as mutable maps, but they are usually not the bottleneck in XQuery applications, because deleted entries (or obsolete maps) will automatically be discarded by the garbage collector as soon as they are not in the query scope anymore. If you want to enforce this, you can put your map operations into FLWOR expresions or user-declared functions.
Back to your original questions:
- This module must expose a method "hashset:clear($hashset)" to de-allocate
memory dynamically. The map:module provides the function map:remove, and I could remove all elements. [...] It does not deallocate memory, leading to poor overall performances.
It may be interesting to do some testing in order to find out what’s the actual bottleneck in your query. How man entries is your hash set supposed to contain?
- must expose a method "hashset:add($hashset,$element)" to add memory
dynamically. Through map:module, the only possibility that I see is to wrap it as map:new($hashset, map:entry($element,$dummyboolean)).
Right, using true() is probably the best choice (booleans are only instantiated once in memory).
- first : its dynamic memory management (the in-memory printfoot of my
XQUERY executables are usually tremendous).
This can in fact be a problem, and is mostly due to various decisions taken in the specification, and the complexity of XML nodes in general.
- second : it lacks powerful libraries, as complete math modules.
What kind of functions would you be interested in?
Christian
[1] http://en.wikipedia.org/wiki/Persistent_data_structure [2] http://en.wikipedia.org/wiki/Hash_array_mapped_trie