BaseX is a great tool for analyzing & characterizing large amounts of
XML data. I have used it both at work and on personal projects. I hope
the following observation is useful.
When I define a function that recurs over a sequence of elements in
order to build a map of element name counts, I find that when I
specify the type of the element sequence as 'element()*', the function
runs so slowly that I give up after 5 minutes or so. But when I
specify the type as 'item()*', it finishes in 40 seconds or less.
Here's an example:
-----begin code snippet-----
declare namespace local="w00fw00f";
declare function local:count($elems as element()*, $elem_counts as map(*))
as map(*) {
let $elem := head($elems),
$elem_name := $elem/name(),
$elems_new := tail($elems),
$elem_name_count := if (map:contains($elem_counts, $elem_name))
then map:get($elem_counts, $elem_name) + 1
else 1,
$elem_counts_new := map:put($elem_counts, $elem_name, $elem_name_count)
return if (count($elems_new) = 0)
then $elem_counts_new
else local:count($elems_new, $elem_counts_new)
};
let $coll := collection('pure_20190402'),
$elems := $coll/result/items/*,
$elem_names_map := local:count($elems, map {})
return json:serialize($elem_names_map, map {'format' : 'xquery'})
-----end code snippet-----
In the function declaration, changing "$elems as element()*" to
"$elems as item()*" makes the difference in performance. Replacing the
JSON serialization with a standard XML one does not change the
performance. I am running BaseX 9.1.2 under Ubuntu 16.04.6.
All the best,
Chuck Bearden