Re: [basex-talk] Should it be possible to declare a function in the client?

27 Feb 2020


      I also note, that when I try to mock up something similar with one of my texts, the tokenize 
Seems to give me a zero length string at the start.
It’s there in the output window of basexgui, in the first line, but easy to miss the fact that it’s significant whitespace in this context:
(tokenize(string-join(collection('BOV')[ends-with( db:path(.), '.tei' )][1]/TEI/text/body/p//text() ), '\W+' ) => distinct-values())[not(. = ( "of", "the", "in", "and", "at","by","to" ))]
August
16
17
2015
Members
Board
Visitors
University
Virginia
met
Retreat
Open
Executive
Session
Forum
…
But visible if I apply string-length to the sequence:
(tokenize(string-join(collection('BOV')[ends-with( db:path(.), '.tei' )][1]/TEI/text/body/p//text() ), '\W+' ) => distinct-values())[not(. = ( "of", "the", "in", "and", "at","by","to" ))]  ! string-length(.)
0
6
2
2
4
7
5
8
10
8
3
7
4
9
7
…
I wonder if that’s the semantic difference here.
— Steve M.
...
On Feb 27, 2020, at 3:43 PM, Majewski, Steven Dennis (sdm7g) sdm7g@virginia.edu wrote:
So, if the counts are different depending on who is counting ( R or BaseX ),
The first question is : who is correct ? 
( And the 2nd question is probably: what do you mean by correct ? as the semantics of XQuery sequences and whatever destination R datatype is being counted may be slightly different. I don’t know R that well, but semantics of XQuery sequences and arrays are rather different, for example. )
— Steve M.
...
On Feb 27, 2020, at 2:48 PM, Ben Engbers Ben.Engbers@Be-Logical.nl wrote:
Op 27-02-2020 om 19:19 schreef Christian Grün:
...
It’s difficult to understand what’s going on here. Could you please
provide us self-contained queries without the R wrapper code?
Version 1:
import module namespace functx = 'http://www.functx.com';
(: Extract the text :)
let $txt := collection('IncidentRemarks/Incidents')/csv/record/INC_RM/text()
(: Convert to lower-case and tokenize :)
let $INC_RM := tokenize(lower-case(string-join($txt)),
'(\s|[,.!:;]|[n][b][s][p][;])+')
(: Read Stopwords :)
let $Stoppers := doc('TextMining/Stopwoorden.txt')/text/line/text()
(: Remove Stopwords :)
let $Stop :=  functx:value-except($INC_RM, $Stoppers)
return $Stop"
My R-code first executes this as XQUERY and then calculates the length
of the returned list (=5842).
Version 2:
import module namespace functx = 'http://www.functx.com';
let $txt := collection('IncidentRemarks/Incidents')/csv/record/INC_RM/text()
let $INC_RM := tokenize(lower-case(string-join($txt)),
'(\s|[,.!:;]|[n][b][s][p][;])+')
let $Stoppers := doc('TextMining/Stopwoorden.txt')/text/line/text()
let $Stop :=  functx:value-except($INC_RM, $Stoppers)
return count($Stop)
Returns the length of the sequence (counts 5843 words).
The '\' in the regular expression is intentional (R-specific). With a
single '' the query can be executed in BaseXGUI.
Does this help?
Ben

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Should it be possible to declare a function in the client?