Am 30.03.2020 um 23:16 schrieb Ben Engbers:
Hi,
In textmining, the 'idf' or inverse document frequency is defined as idf(term)=ln(ndocuments / ndocuments containing term). I am working on a function that should return this idf.
This function:
declare function local:wordFreq_idf($nodes as node()*) as array(*) { let $count := count($nodes) let $text := for $node in $nodes return $node/text() => tokenize() => distinct-values() let $idf := $text => tidyTM:wordCount_arr() return $idf };
returns:
["probleem", 703] ["opgelost.", 248] ["dictu", 235] ["opgelost", 217] ["medewerker", 193] ...
So does the working function return a sequence of arrays? That doesn't match the as array(*) return type declaration, it seems.
What does tidyTM:wordCount_arr() return, a single array (of atomic items)?