Hi Ben - I'm on mobile, please excuse any typos.
Maybe `return array { $idf }` is closer?
Untested, apologies! Best, Bridger
On Mon, Mar 30, 2020, 5:16 PM Ben Engbers Ben.Engbers@be-logical.nl wrote:
Hi,
In textmining, the 'idf' or inverse document frequency is defined as idf(term)=ln(ndocuments / ndocuments containing term). I am working on a function that should return this idf.
This function:
declare function local:wordFreq_idf($nodes as node()*) as array(*) { let $count := count($nodes) let $text := for $node in $nodes return $node/text() => tokenize() => distinct-values() let $idf := $text => tidyTM:wordCount_arr() return $idf };
returns:
["probleem", 703] ["opgelost.", 248] ["dictu", 235] ["opgelost", 217] ["medewerker", 193] ...
For "probleem", the idf should be calculated as ln($count/703). Since there are 1780 nodes this would result in 0.929011751. I tried to exten the 'let $idf' line with: => array:for-each(function($idf) {array:append($idf, math:log($count div $idf[2]) )}) which should result in ["probleem", 703, 0.929011751]
but no mather what I do, every time I get this error: [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([ "probleem", 703 ], [ "opgelost.", 248 ], ...).
Is it possible to apply array:for-each on an array of arrays?
Ben