I was wondering about nbsp as well. Maybe you don’t need it at all, but we’d need to have a look at your files.
Could you additionally provide us with minimized instances of your Incidents and Stopwoorden.txt XML documents? They should have the same structure, but contain only a few lines of contents.
On Fri, Feb 28, 2020 at 11:45 AM Ben Engbers Ben.Engbers@be-logical.nl wrote:
Op 27-02-2020 om 22:03 schreef Majewski, Steven Dennis (sdm7g):
Also: is ‘( )’ what you want as part of you regex to also catch the ampersand ? I’m just guessing your intent here. You could also try ‘(\W| )+’ - i.e. non-word, but I’m kind of assuming that it handles non-normalized unicode accented characters correctly and reads them as word chars and not delimiters. That would be, of course, the right thing, but I’ld probably test it first.
— Steve.
I just copied the regex-expression from this page "https://en.wikibooks.org/wiki/XQuery/Tag_Cloud" (using regex always gives me headaches ;-( ). But even after removing the "|[n][b][s][p][;]" from the regex, basexgui still returns 5843.
Ben