Op 27-02-2020 om 19:19 schreef Christian Grün:
It’s difficult to understand what’s going on here. Could you please provide us self-contained queries without the R wrapper code?
Version 1:
import module namespace functx = 'http://www.functx.com'; (: Extract the text :) let $txt := collection('IncidentRemarks/Incidents')/csv/record/INC_RM/text() (: Convert to lower-case and tokenize :) let $INC_RM := tokenize(lower-case(string-join($txt)), '(\s|[,.!:;]|[n][b][s][p][;])+') (: Read Stopwords :) let $Stoppers := doc('TextMining/Stopwoorden.txt')/text/line/text() (: Remove Stopwords :) let $Stop := functx:value-except($INC_RM, $Stoppers) return $Stop"
My R-code first executes this as XQUERY and then calculates the length of the returned list (=5842).
Version 2:
import module namespace functx = 'http://www.functx.com'; let $txt := collection('IncidentRemarks/Incidents')/csv/record/INC_RM/text() let $INC_RM := tokenize(lower-case(string-join($txt)), '(\s|[,.!:;]|[n][b][s][p][;])+') let $Stoppers := doc('TextMining/Stopwoorden.txt')/text/line/text() let $Stop := functx:value-except($INC_RM, $Stoppers) return count($Stop)
Returns the length of the sequence (counts 5843 words).
The '\' in the regular expression is intentional (R-specific). With a single '' the query can be executed in BaseXGUI.
Does this help?
Ben