Hi Ben -I'm on mobile, please excuse any typos.
Maybe
`return array { $idf }`
is closer?
Untested, apologies!
Best,
Bridger
Hi,
In textmining, the 'idf' or inverse document frequency is defined as
idf(term)=ln(ndocuments / ndocuments containing term). I am working on a
function that should return this idf.
This function:
declare function local:wordFreq_idf($nodes as node()*) as array(*) {
let $count := count($nodes)
let $text := for $node in $nodes
return $node/text() => tokenize() => distinct-values()
let $idf := $text => tidyTM:wordCount_arr()
return $idf
};
returns:
["probleem", 703]
["opgelost.", 248]
["dictu", 235]
["opgelost", 217]
["medewerker", 193]
...
For "probleem", the idf should be calculated as ln($count/703). Since
there are 1780 nodes this would result in 0.929011751.
I tried to exten the 'let $idf' line with:
=> array:for-each(function($idf) {array:append($idf,
math:log($count div $idf[2]) )})
which should result in ["probleem", 703, 0.929011751]
but no mather what I do, every time I get this error:
[XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([
"probleem", 703 ], [ "opgelost.", 248 ], ...).
Is it possible to apply array:for-each on an array of arrays?
Ben