While experimenting (I try to speed up the querys), I compared the
results from these 2 querys:

It’s difficult to understand what’s going on here. Could you please provide us self-contained queries without the R wrapper code?




Word_Inc_Rm_Stop_txt <- "import module namespace functx =
'http://www.functx.com';
  let $txt :=
collection('IncidentRemarks/Incidents')/csv/record/INC_RM/text()
  let $INC_RM := tokenize(lower-case(string-join($txt)),
'(\\s|[,.!:;]|[n][b][s][p][;])+')
  let $Stoppers := doc('TextMining/Stopwoorden.txt')/text/line/text()
  let $Stop :=  functx:value-except($INC_RM, $Stoppers)
  return $Stop"
Word_Inc_Rm_Stop <- Session$Execute(as.character(glue("xquery
{Word_Inc_Rm_Stop_txt}")))$result[[1]]
Word_Inc_Rm_Stop_Count <- length(Word_Inc_Rm_Stop)

Word_Inc_Rm_Stop_txt_2 <- "import module namespace functx =
'http://www.functx.com';
  let $txt :=
collection('IncidentRemarks/Incidents')/csv/record/INC_RM/text()
  let $INC_RM := tokenize(lower-case(string-join($txt)),
'(\\s|[,.!:;]|[n][b][s][p][;])+')
  let $Stoppers := doc('TextMining/Stopwoorden.txt')/text/line/text()
  let $Stop :=  functx:value-except($INC_RM, $Stoppers)
  return count($Stop)"
Word_Inc_Rm_Stop_Count_2 <- Session$Execute(as.character(glue("xquery
{Word_Inc_Rm_Stop_txt_2}")))$result[[1]]

These are the processing-times:

Version 1:
> print(proc.time() - ptm)
   user  system elapsed
  2.903   0.022   3.160
Version 2:
> print(proc.time() - ptm)
   user  system elapsed
  0.041   0.004   1.089

I guess it makes sense to put effort in speeding up my code. But what
bothers me is the following.

The first query computes the length from the vector that is returned,
The result is 5842.
The second query returns the length as computed by basex. This result is
5843. The GUI also returns 5843 as result.

I copied the output from
  ..
  return $Stop

to a new LibreOffice-document. That document counts 5842 words.

Who is right?

Cheers,
Ben