Hi Jean-Marc,

I’m not sure how your actual data looks like, but you could have a look the attached query, which uses XQuery maps, and which allowe
d us to
inser
t
10 million strings in about 40 seconds
.
Most 
presumably, it’s 
not as memory efficient as a Java map, but my guess is that most memory is actually spent by the XQuery representation of string items, and not necessarily the map itself.

Hope this helps,
Christian
___________________________
___________________________

declare variable $my-big-DataBase := doc('db')/root/*;

declare function local:do-some-stuff-with-set($map) {
  count(map:keys($map))
};

declare function local:init() {
  local:do-some-stuff-with-set(local:tree-traversal($my-big-DataBase, map:new()))
};

declare function local:tree-traversal($elements, $map) {
  fold-left($elements, $map,
    function($map2, $element) {
      let $map3 := map:new(($map2, { local:get-some-stuff-within($element) : true() }))
      return local:tree-traversal($element/element(), $map3)
    }
  )
};

declare function local:get-some-stuff-within($element) {
  $element/@name
};

local:init()
___________________________
___________________________

On Tue, Nov 12, 2013 at 5:48 PM, jean-marc Mercier <jeanmarc.mercier@gmail.com> wrote:
> Dear Christian,
>
> Thanks for your answer.
>>It may be interesting to do some testing in order to find out what’s
>>the actual bottleneck in your query. How man entries is your hash set
>>supposed to contain?
>
> The HashSet contains 10M strings. That's not a big deal. The problems comes
> from filling the HashSet.
> Well, let us be more technacal. I am using hashset for tree traversal
> operations over a DataBase of 10M nodes, 1Go. This HashSet collects some
> informations during the traversal, mainly a string at each node. Actually,
> my code binds JAVA with something like
>
> import module namespace set = "java.util.HashSet";
>
> declare function init(){set:clear(), tree_traversal($my_big_DataBase),
> do_some_stuff_with_hash_set()}
> declare function tree_traversal($elements)
> {
>     for $element in $elements return (
>            set:add(get_some_stuff_within($element) ),
>            tree_traversal($element/element())
>      )
> }
> and the whole traversal takes a minute.
>
> I don't know how to achieve this with pure XQUERY. The only thing that I
> tried is something like  :
>
> declare function init(){let $my_hash_set := tree_traversal($my_big_DataBase,
> map:new()) return do_some_stuff_with($my_hash_set) }
>
> declare function tree_traversal($elements, $hash_set) as function(*) (return
> a HashSet)
> {
>      for $element in $elements return (
>            tree_traversal($nodes/element(), $hash_set),
>            map:entry( get_some_stuff_within($element),boolean)
>      )
> }
> that performed very badly, compared to the first version (JAVA binding). I
> explained this thinking that the first version is O(N) operations, and the
> second is O(N^2) operations due to in-memory copy.
>
>>> - second : it lacks powerful libraries, as complete math modules.
>
>> What kind of functions would you be interested in?
>
> I need a stochastic modulus, as
> http://www.boost.org/doc/libs/1_55_0/libs/math/doc/html/dist.html
> also a linear Algebra modulus as UBLAS
> http://www.boost.org/doc/libs/1_54_0/libs/numeric/ublas/doc/index.htm
>
> I guess that I could find JAVA equivalent and bind them to BaseX. Such
> functionalities are available directly as XQUERY modules ?
>
> Cheers,
>
> Jean-Marc
>
>
>
>
> 2013/11/12 Christian Grün <christian.gruen@gmail.com>
>>
>> Dear JohnLeM,
>>
>> thanks for your mail. As you already noted, XQuery is a functional
>> language, and this is the reason why XQuery maps are not exactly
>> comparable to maps and sets, as they are used in imperative languages:
>>
>> All maps in XQuery are persisent (immutable): Once a map has been
>> generated, it is not possible to change its contents. Instead, a new
>> map is to be created by each insertion or deletion [1]. This sounds
>> like a huge memory killer, but it’s not as bad as you might guess.
>> Various efficient solutions exist for persistent map, such as the
>> mapped trie that has been implemented in BaseX [2]. It will only
>> create copies of parts of the data structure that are to be changed.
>> The following query is an example for a query which creates 100.000
>> map with a single entry, and a large map containing all the entries;
>> on my system, it requires 200 ms to run:
>>
>>   map:size(map:new(
>>     for $i in 1 to 100000
>>    return map { $i := true() }
>>   ))
>>
>> In short: persistent maps may not be as efficient as mutable maps, but
>> they are usually not the bottleneck in XQuery applications, because
>> deleted entries (or obsolete maps) will automatically be discarded by
>> the garbage collector as soon as they are not in the query scope
>> anymore. If you want to enforce this, you can put your map operations
>> into FLWOR expresions or user-declared functions.
>>
>> Back to your original questions:
>>
>> > 2) This module must expose a method "hashset:clear($hashset)" to
>> > de-allocate
>> > memory dynamically. The map:module provides the function map:remove, and
>> > I
>> > could remove all elements. [...]  It does not deallocate memory, leading
>> > to poor
>> > overall performances.
>>
>> It may be interesting to do some testing in order to find out what’s
>> the actual bottleneck in your query. How man entries is your hash set
>> supposed to contain?
>>
>> > 3) must expose a method "hashset:add($hashset,$element)" to add memory
>> > dynamically. Through map:module, the only possibility that I see is to
>> > wrap
>> > it as map:new($hashset, map:entry($element,$dummyboolean)).
>>
>> Right, using true() is probably the best choice (booleans are only
>> instantiated once in memory).
>>
>> > - first : its dynamic memory management (the in-memory printfoot of my
>> > XQUERY executables are usually tremendous).
>>
>> This can in fact be a problem, and is mostly due to various decisions
>> taken in the specification, and the complexity of XML nodes in
>> general.
>>
>> > - second : it lacks powerful libraries, as complete math modules.
>>
>> What kind of functions would you be interested in?
>>
>> Christian
>>
>> [1] http://en.wikipedia.org/wiki/Persistent_data_structure
>> [2] http://en.wikipedia.org/wiki/Hash_array_mapped_trie
>
>