Dear Christian,Thanks for your answer.>It may be interesting to do some testing in order to find out what’s>the actual bottleneck in your query. How man entries is your hash set
>supposed to contain?The HashSet contains 10M strings. That's not a big deal. The problems comes from filling the HashSet.Well, let us be more technacal. I am using hashset for tree traversal operations over a DataBase of 10M nodes, 1Go. This HashSet collects some informations during the traversal, mainly a string at each node. Actually, my code binds JAVA with something likeimport module namespace set = "java.util.HashSet";declare function init(){set:clear(), tree_traversal($my_big_DataBase), do_some_stuff_with_hash_set()}declare function tree_traversal($elements){for $element in $elements return (set:add(get_some_stuff_within($element) ),tree_traversal($element/element()))}and the whole traversal takes a minute.I don't know how to achieve this with pure XQUERY. The only thing that I tried is something like :declare function init(){let $my_hash_set := tree_traversal($my_big_DataBase, map:new()) return do_some_stuff_with($my_hash_set) }declare function tree_traversal($elements, $hash_set) as function(*) (return a HashSet)
{for $element in $elements return (tree_traversal($nodes/element(), $hash_set),map:entry( get_some_stuff_within($element),boolean))}that performed very badly, compared to the first version (JAVA binding). I explained this thinking that the first version is O(N) operations, and the second is O(N^2) operations due to in-memory copy.>> - second : it lacks powerful libraries, as complete math modules.> What kind of functions would you be interested in?
I need a stochastic modulus, as http://www.boost.org/doc/libs/1_55_0/libs/math/doc/html/dist.htmlalso a linear Algebra modulus as UBLAS http://www.boost.org/doc/libs/1_54_0/libs/numeric/ublas/doc/index.htmI guess that I could find JAVA equivalent and bind them to BaseX. Such functionalities are available directly as XQUERY modules ?Cheers,Jean-Marc2013/11/12 Christian Grün <christian.gruen@gmail.com>
Dear JohnLeM,
thanks for your mail. As you already noted, XQuery is a functional
language, and this is the reason why XQuery maps are not exactly
comparable to maps and sets, as they are used in imperative languages:
All maps in XQuery are persisent (immutable): Once a map has been
generated, it is not possible to change its contents. Instead, a new
map is to be created by each insertion or deletion [1]. This sounds
like a huge memory killer, but it’s not as bad as you might guess.
Various efficient solutions exist for persistent map, such as the
mapped trie that has been implemented in BaseX [2]. It will only
create copies of parts of the data structure that are to be changed.
The following query is an example for a query which creates 100.000
map with a single entry, and a large map containing all the entries;
on my system, it requires 200 ms to run:
map:size(map:new(
for $i in 1 to 100000
return map { $i := true() }
))
In short: persistent maps may not be as efficient as mutable maps, but
they are usually not the bottleneck in XQuery applications, because
deleted entries (or obsolete maps) will automatically be discarded by
the garbage collector as soon as they are not in the query scope
anymore. If you want to enforce this, you can put your map operations
into FLWOR expresions or user-declared functions.
Back to your original questions:
> could remove all elements. [...] It does not deallocate memory, leading to poor
> 2) This module must expose a method "hashset:clear($hashset)" to de-allocate
> memory dynamically. The map:module provides the function map:remove, and I
> overall performances.
It may be interesting to do some testing in order to find out what’s
the actual bottleneck in your query. How man entries is your hash set
supposed to contain?
Right, using true() is probably the best choice (booleans are only
> 3) must expose a method "hashset:add($hashset,$element)" to add memory
> dynamically. Through map:module, the only possibility that I see is to wrap
> it as map:new($hashset, map:entry($element,$dummyboolean)).
instantiated once in memory).
This can in fact be a problem, and is mostly due to various decisions
> - first : its dynamic memory management (the in-memory printfoot of my
> XQUERY executables are usually tremendous).
taken in the specification, and the complexity of XML nodes in
general.
What kind of functions would you be interested in?
> - second : it lacks powerful libraries, as complete math modules.
Christian
[1] http://en.wikipedia.org/wiki/Persistent_data_structure
[2] http://en.wikipedia.org/wiki/Hash_array_mapped_trie
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk