Hi Chuck,
Martin already suggested that map construction via map:merge is preferable and faster (my personal experience is that there are just few cases in which map:put is a better choice).
Your query was an interesting one, though. In various cases, we drop type information at runtime, as it can be expensive to decorate all newly generated sequences with the correct type. As a result, the type of your function arguments is verified every time the function is called, and this takes additional time.
But as it’s always recommendable to declare types, and as this is not the first time that this is chasing me, I had some more thoughts, and I have found a good answer on how to improve generally typing at runtime! You can already be sure that your query will benefit from the upcoming optimizations, i.e., with BaseX 9.2.
Due to this, and due to some other minor optimizations that are still in progress, we decided to delay the release until beginning of next week.
Cheers Christian
On Thu, Apr 11, 2019 at 12:10 AM Chuck Bearden cfbearden@gmail.com wrote:
BaseX is a great tool for analyzing & characterizing large amounts of XML data. I have used it both at work and on personal projects. I hope the following observation is useful.
When I define a function that recurs over a sequence of elements in order to build a map of element name counts, I find that when I specify the type of the element sequence as 'element()*', the function runs so slowly that I give up after 5 minutes or so. But when I specify the type as 'item()*', it finishes in 40 seconds or less. Here's an example:
-----begin code snippet----- declare namespace local="w00fw00f"; declare function local:count($elems as element()*, $elem_counts as map(*)) as map(*) { let $elem := head($elems), $elem_name := $elem/name(), $elems_new := tail($elems), $elem_name_count := if (map:contains($elem_counts, $elem_name)) then map:get($elem_counts, $elem_name) + 1 else 1, $elem_counts_new := map:put($elem_counts, $elem_name, $elem_name_count) return if (count($elems_new) = 0) then $elem_counts_new else local:count($elems_new, $elem_counts_new) };
let $coll := collection('pure_20190402'), $elems := $coll/result/items/*, $elem_names_map := local:count($elems, map {}) return json:serialize($elem_names_map, map {'format' : 'xquery'}) -----end code snippet-----
In the function declaration, changing "$elems as element()*" to "$elems as item()*" makes the difference in performance. Replacing the JSON serialization with a standard XML one does not change the performance. I am running BaseX 9.1.2 under Ubuntu 16.04.6.
All the best, Chuck Bearden