Hi,
Currently when I need to batch a database for processing I do something like this:
for $record at $p in $recordSet where ($p ge $start) and ($p ge $end) return $record/process-record(.)
I can imagine why this is more efficient. in BaseX, than using an XPath filter expression, as in
$recordSet[position() ge $start and position() le $end]/process-record(.)
(which is what I might prefer to do in other environments).
But is the 'where' clause the best I can do? XQuery 3.0 has tumbling windows. I am not iterating over subsequences, but rather selecting one (arbitrary) subsequence from a sequence. What's my best approach to this in BaseX?
Thanks, Wendell
For example, XQuery 3.0 has tumbling windows. Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^
Hi Wendell,
an approach to check which is fastest, is the script below (using some appropriate sequence). I can not tell, from the implementation side, which is the one to win, though fn:subsequence() did a good job during the few runs on my machine.
hope this helps, Arve
------ let $seq := (), $cnt := count($seq), $start := random:integer($cnt), $end := random:integer($cnt) return ( (: testing for loop :) prof:dump("for ... "), prof:time( for $x at $p in $seq where $start le $p and $p le $end return $x ),
(: testing subsequence :) prof:dump("fn:subsequence() "), prof:time( fn:subsequence($seq, $start, $end) ),
(: testing subsequence with [...] :) prof:dump("[...]"), prof:time( $seq[$start le position() and position() le $end] ) ) ----
Am 13.08.13 16:58, schrieb Wendell Piez:
Hi,
Currently when I need to batch a database for processing I do something like this:
for $record at $p in $recordSet where ($p ge $start) and ($p ge $end) return $record/process-record(.)
I can imagine why this is more efficient. in BaseX, than using an XPath filter expression, as in
$recordSet[position() ge $start and position() le $end]/process-record(.)
(which is what I might prefer to do in other environments).
But is the 'where' clause the best I can do? XQuery 3.0 has tumbling windows. I am not iterating over subsequences, but rather selecting one (arbitrary) subsequence from a sequence. What's my best approach to this in BaseX?
Thanks, Wendell
For example, XQuery 3.0 has tumbling windows. Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^ _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Wendell,
for $record at $p in $recordSet where ($p ge $start) and ($p ge $end) return $record/process-record(.)
I would usually write this as:
$recordSet[position() = $start to $end]/process-record(.)
…but as the query optimizer rewrites some of the possible alternatives to the same internal representation, it may happen that it doesn’t really matter.
Did you encounter some performance bottlenecks? Christian
Arve and Christian,
Thanks for reminding me of both the subsequence() function and the range operator "$a to $b".
(It's terrible how what is obvious to the brain at one time may be completely forgotten at another time. Maybe the brain in question is getting old and wearing out.)
I will follow your advice and try some experiments, adding these options and watching for the rewrites. As for a bottleneck ... yes there are performance bottlenecks but those are appearing only in the rather large datasets my customer is running. In general I want to see that the code I offer them is as efficient as possible (avoiding known impediments) even before performance testing at scale.
Thanks, Wendell
Wendell Piez | http://www.wendellpiez.com XML | XSLT | electronic publishing Eat Your Vegetables _____oo_________o_o___ooooo____ooooooo_^
On Tue, Aug 13, 2013 at 12:42 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Wendell,
for $record at $p in $recordSet where ($p ge $start) and ($p ge $end) return $record/process-record(.)
I would usually write this as:
$recordSet[position() = $start to $end]/process-record(.)
…but as the query optimizer rewrites some of the possible alternatives to the same internal representation, it may happen that it doesn’t really matter.
Did you encounter some performance bottlenecks? Christian
basex-talk@mailman.uni-konstanz.de