Hi Kristian,
- the regular expression "(.){3}" doesn't match the same as "(...)".
Shouldn't they be equal?
They look similar indeed, but are not equivalent. In the first expression, the repeated dots will be part of the resulting match, but not of the subordinate match group. "(.{3})" is probably what you are looking for.
- a very annoying whitespace is placed text to the newline of out:nl(). It
is placed before out:nl() if it is called in the beginning of an element, or it is placed after the newline if out:nl() is called in the end of an element.
This is not related to out:nl(), but to the way how XQuery node construction works (“The individual strings resulting from the previous step are merged into a single string by concatenating them with a single space character between each pair.” [1]). Simply use string-join() for concatenating the results, as you do anyway:
let $regex := '(.{3})|[\W]' let $text := "this one... is the first." return <s>{ string-join( ( out:nl(), analyze-string($text, $regex)//text()[not(.=" ")], out:nl() ), out:nl() )}</s>
Best, Christian