Hello --
So as part of building tests, I'm regularizing the text contents of some Word documents into single strings. (Which makes it relatively easy to make sure no words have gotten lost or changed order when compared to other stages of the process.)
Regularization is a tactful way to put this particular atrocity:
let $stringTidy as function(xs:string+) as xs:string := function($in as xs:string+) as xs:string {$in => string-join(' ') => replace(xquery:eval($menuMatch),'') => replace('
',' ') => replace('	',' ') => replace('
',' ') => replace('\p{Zs}',' ') => replace(' +',' ') => replace(' ([,.;:])','$1') => replace('^ ','') => replace(' $','')}
$menuMatch gets stripped out of the Word because it's added by processing, rather than being present in the source file which generates the other half of the compare. (It's currently U+1405, ᐅ, though I devoutly hope this doesn't matter!) It gets read from an XSL source document, which I've included in minimal form, along with some sample data and a minimal-ish query.
If I use $menuMatch in the replace, it doesn't work, in the sense that the ᐅ character is NOT removed from the string. If I xquery:eval() it, as here, the replace does work to remove the ᐅ from the string. I don't expect to need xquery:eval to use a variable as the second argument of replace(). Am I wrong? Has the pile of arrow operators exceeded the bounds of reason?
Thanks! Graydon
Hi Graydon,
$menuMatch gets stripped out of the Word because it's added by processing, rather than being present in the source file which generates the other half of the compare. (It's currently U+1405, ᐅ, though I devoutly hope this doesn't matter!)
You’ll need to remove the single quotes from $menuMatch (it yields 'ᐅ'). Apart from that, $stringTidy can possibly be simplified:
let $menuMatch := doc($varDevsDocSource) /descendant::xsl:variable[@name eq 'menucascadeSeparator'] /@select => replace(''', '')
let $stringTidy := function($in) { $in => string-join(' ') => replace($menuMatch, '') => normalize-space() => replace(' ([,.;:])', '$1') }
Cheers, Christian
On 2/6/21, Graydon Saunders graydonish@gmail.com wrote:
Hello --
So as part of building tests, I'm regularizing the text contents of some Word documents into single strings. (Which makes it relatively easy to make sure no words have gotten lost or changed order when compared to other stages of the process.)
Regularization is a tactful way to put this particular atrocity:
let $stringTidy as function(xs:string+) as xs:string := function($in as xs:string+) as xs:string {$in => string-join(' ') => replace(xquery:eval($menuMatch),'') => replace('
',' ') => replace('	',' ') => replace('
',' ') => replace('\p{Zs}',' ') => replace(' +',' ') => replace(' ([,.;:])','$1') => replace('^ ','') => replace(' $','')}
$menuMatch gets stripped out of the Word because it's added by processing, rather than being present in the source file which generates the other half of the compare. (It's currently U+1405, ᐅ, though I devoutly hope this doesn't matter!) It gets read from an XSL source document, which I've included in minimal form, along with some sample data and a minimal-ish query.
If I use $menuMatch in the replace, it doesn't work, in the sense that the ᐅ character is NOT removed from the string. If I xquery:eval() it, as here, the replace does work to remove the ᐅ from the string. I don't expect to need xquery:eval to use a variable as the second argument of replace(). Am I wrong? Has the pile of arrow operators exceeded the bounds of reason?
Thanks! Graydon
On Sat, Feb 06, 2021 at 11:09:55PM +0100, Christian Grün scripsit:
Hi Graydon,
$menuMatch gets stripped out of the Word because it's added by processing, rather than being present in the source file which generates the other half of the compare. (It's currently U+1405, ᐅ, though I devoutly hope this doesn't matter!)
You’ll need to remove the single quotes from $menuMatch (it yields 'ᐅ'). Apart from that, $stringTidy can possibly be simplified:
ARGH.
Thank you!
let $stringTidy := function($in) { $in => string-join(' ') => replace($menuMatch, '') => normalize-space() => replace(' ([,.;:])', '$1') }
Yes. When I started it wasn't clear if I wanted all of normalize-space() or not, and then it just grew.
Much appreciated!
-- Graydon
basex-talk@mailman.uni-konstanz.de