hello --
So I want to subtract one list of sentences from another list of sentences. (As a way of checking "what did processing do to this document?"; we know it added some words, and if we added words to the sentence it was at the start of the sentence, but none of the original words should be changed or missing or reordered relative to the document.)
I have tested that I have the sequences of strings which are the list of sentences before I use them to call:
declare function local:subtractSentences($before as xs:string*,$after as xs:string*) as xs:string* {
let $thisBefore as xs:string? := head($before) let $thisAfter as xs:string? := head($after)
return if (not(exists($thisBefore)) or not(exists($thisAfter))) then $after (: we're done because we're out of strings, but if we have any after left we want it :) else if (ends-with($thisAfter,$thisBefore)) then (replace($thisAfter,$thisBefore,''),local:subtractSentences(tail($before),tail($after))) else ($thisAfter,local:subtractSentences($before,tail($after))) };
In ~150 ms, I get
Error: Stack Overflow: Try tail recursion?
so I have written the function wrong. What's the more useful pattern for this case?
Thanks! Graydon
On Fri, 2021-06-04 at 12:35 -0400, Graydon Saunders wrote:
declare function local:subtractSentences($before as xs:string*,$after as xs:string*) as xs:string* {
let $thisBefore as xs:string? := head($before) let $thisAfter as xs:string? := head($after)
return if (not(exists($thisBefore)) or not(exists($thisAfter))) then $after (: we're done because we're out of strings, but if we have any after left we want it :) else if (ends-with($thisAfter,$thisBefore)) then (replace($thisAfter,$thisBefore,''),local:subtractSentences(tail($bef ore),tail($after))) else ($thisAfter,local:subtractSentences($before,tail($after))) };
I'd be tempted to do this by constructing a regular expression on the fly, mapping whitespace in the input to \s+ in the expression and quoting \ elsewhere, and use a single replace().
Your problem is likely (ididn't check the BaseX query plan) that your final sequence constructor precludes tail optimization.
($thisAfter,local:subtractSentences($before,tail($after))
either use one of the fold-left or fold-right functions, or pass $thisAfter as an argument to your function and return it at the end, perhaps.
Liam
Hi Graydon,
On 2021-06-04 at 18:35, Graydon Saunders wrote:
In ~150 ms, I get
Error: Stack Overflow: Try tail recursion?
so I have written the function wrong. What's the more useful pattern for this case?
As the helpful error message suggests, the recursion in your algorithm your algorithm is so deep (one level per item in the `$before` and `$after` lists) that it runs out of stack space. Since BaseX implements "tail-call optimization", you can avoid that problem by rewriting your function so that all recursive calls are in a "tail position" within the function body, as roughly described on Wikipedia [1]. In BaseX tail calls can be inside (at least) nested layers of the `then` and `else` branches of `if`, the results of a `switch` or `typeswitch` and in the `return` of a FLWOR expression in which each clause is statically known to produce either 0 or 1 result/binding.
In your case you can gather all results in an additional, accumulating parameter of the function and return everything at once at the end:
declare function local:subtractSentences( $before as xs:string*, $after as xs:string*, $out as xs:string* ) as xs:string* { let $thisBefore as xs:string? := head($before) let $thisAfter as xs:string? := head($after) return if (empty($thisBefore) or empty($thisAfter)) then ( ($out, $after) ) else ( let $new-output := if(not(ends-with($thisAfter, $thisBefore))) then $thisAfter else replace($thisAfter, $thisBefore, '') return local:subtractSentences( tail($before), tail($after), ($out, $new-output) ) ) };
Here the only recursive call is in tail position inside the function body because the only expressions surrounding it are two FLWOR expressions with only `let` and `return` clauses and an `else`. It therefore does not consume unlimited stack space any more.
You can also check compilation output in the "Info" panel of the BaseX GUI to see if a function call was identified as a tail call. I see the following line:
- mark as tail call: local:subtractSentences(tail($before_0), tail($after_1), ($out_2, (if(ends-with($thisAfter_4, $thisB...
Hope that helps!
Cheers, Leo
Hello Leo --
On Sat, Jun 05, 2021 at 12:35:33AM +0200, Leonard Wörteler scripsit: [snip]
In BaseX tail calls can be inside (at least) nested layers of the `then` and `else` branches of `if`, the results of a `switch` or `typeswitch` and in the `return` of a FLWOR expression in which each clause is statically known to produce either 0 or 1 result/binding.
[snip]
Hope that helps!
It does help, thank you!
It needs a tweak to do the thing -- $after always sheds the head of the sequence, $before does only when there's a match. Otherwise if the processing adds a sentence, the alignment for an appropriate compare is lost for every subsequent attempted match.
And the devil crept up behind me and whispered "compactness", so I wound up with:
declare function local:subtractSentences( $before as xs:string*, $after as xs:string*, $out as xs:string* ) as xs:string* { let $thisBefore as xs:string? := head($before) let $thisAfter as xs:string? := head($after)
let $matches as xs:boolean := ends-with($thisAfter,$thisBefore)
return if (empty($thisBefore) or empty($thisAfter)) then ($out, $after) else local:subtractSentences( if ($matches) then tail($before) else $before, tail($after), ($out,if (not($matches)) then $thisAfter else replace($thisAfter,$thisBefore,'','q')) ) };
This is about twice as fast -- twenty-odd milliseconds instead of thereabouts of fifty -- compared to the messing-with-indexes version.
Not going to try to decide which I think is more clearly expressive at this hour.
Thank you! Graydon
basex-talk@mailman.uni-konstanz.de