Hello,
I was wondering if there are ways to access how much time is spent in parts of a query/program, to find out inefficient functions or expressions? Something like the total time spent in functions would be quite useful.
I saw the Profiling module, but it requires you to change the code and it also won't aggregate results when you run it in a loop. It's still doable if you add labels and then do some parsing on stderr to aggregate the results yourself, but it's quite cumbersome.
Some thoughts as a side note: A nice way to avoid having to change the code to do profiling (or other debugging like breakpoints) would be to change or add hooks into the AST of the query. That way you could for example add a prof:time() around all the function bodies, with the function name as label.
And to help interpret the results of debugging functions like prof:time or trace, it would be nice to have a way to format them and output them as a stream of items (or even multiple streams). Then you could create XQuery scripts that consume the streams to aggregate the information in a human readable way.
Best regards, Iwan
Dear Iwan,
I see you have already found the Profiling Module. Currently, it’s the only way in fact to profile your code within XQuery. Another alternative is to do some Java profiling (e.g. via -Xrunhprof), but this will only allow in general which XQuery expressions are expensive.
Thanks for your suggestions on how to improve the processing of debugging information. We could indeed think about adding a profiling flag, which will summarize the evaluation time of each function by wrapping the bodies with something like prof:time. My immediate though (among others) would be how to profile functions that will be rewritten and inlined. An example:
declare function local:inc($n) { 1 + $n }; for $i in (1, 2) let $j := local:inc($i) return $j
…will be statically rewritten to…
for $i in (1, 2) return (1 + $i)
We could probably either suppress function inlining if code is profiled, or discard profiling of inlined functions. Both approaches have their advantages and drawbacks.
I didn’t the concept of streaming the results; could you give us a little example on how this could look like, or describe this in more detail?
Thanks & cheers, Christian
I was wondering if there are ways to access how much time is spent in parts of a query/program, to find out inefficient functions or expressions? Something like the total time spent in functions would be quite useful.
I saw the Profiling module, but it requires you to change the code and it also won't aggregate results when you run it in a loop. It's still doable if you add labels and then do some parsing on stderr to aggregate the results yourself, but it's quite cumbersome.
Some thoughts as a side note: A nice way to avoid having to change the code to do profiling (or other debugging like breakpoints) would be to change or add hooks into the AST of the query. That way you could for example add a prof:time() around all the function bodies, with the function name as label.
And to help interpret the results of debugging functions like prof:time or trace, it would be nice to have a way to format them and output them as a stream of items (or even multiple streams). Then you could create XQuery scripts that consume the streams to aggregate the information in a human readable way.
Best regards, Iwan
Thanks, I will have to try the Java profiling, it could already help.
I would think it would be easier to wrap the function bodies before the static rewrite, but in any case it will always interfere with the optimizations so it's quite hard to know what to do...
For the streaming, let's say you've added your prof:time around function bodies, right now it would output to stderr with a format like "Tag: 0.12 ms". If you could instead somehow process the result of prof:time like a normal xquery function, then you could output it to any place you want (stderr, file, database) and in the format you want. Then you could have another program to read that output and transform it.
An example with a made up option for prof:time:
declare function local:process-prof-time($result, $time, $tag) { file:write('/tmp/profile.txt', <profile> <timestamp>{prof:current-ns()}</timestamp> <tag>{$tag}</tag> <time>{$time}</time> </profile> ) };
for $i in 1 to 10000 return prof:time($i * $i, false(), "Tag", (: A callback to process the result of prof:time :) local:process-prof-time )
Which would make it easy to transform the profiling by reading /tmp/profile.txt, and you could also stream it to avoid creating huge output.
In this example I use a callback directly in the code but you could imagine other ways to implement it. Maybe some special debug module which can implement the callbacks.
Cheers, Iwan
Le vendredi 19 mai 2017 à 23:15 +0200, Christian Grün a écrit :
Dear Iwan,
I see you have already found the Profiling Module. Currently, it’s the only way in fact to profile your code within XQuery. Another alternative is to do some Java profiling (e.g. via -Xrunhprof), but this will only allow in general which XQuery expressions are expensive.
Thanks for your suggestions on how to improve the processing of debugging information. We could indeed think about adding a profiling flag, which will summarize the evaluation time of each function by wrapping the bodies with something like prof:time. My immediate though (among others) would be how to profile functions that will be rewritten and inlined. An example:
declare function local:inc($n) { 1 + $n }; for $i in (1, 2) let $j := local:inc($i) return $j
…will be statically rewritten to…
for $i in (1, 2) return (1 + $i)
We could probably either suppress function inlining if code is profiled, or discard profiling of inlined functions. Both approaches have their advantages and drawbacks.
I didn’t the concept of streaming the results; could you give us a little example on how this could look like, or describe this in more detail?
Thanks & cheers, Christian
I was wondering if there are ways to access how much time is spent in parts of a query/program, to find out inefficient functions or expressions? Something like the total time spent in functions would be quite useful.
I saw the Profiling module, but it requires you to change the code and it also won't aggregate results when you run it in a loop. It's still doable if you add labels and then do some parsing on stderr to aggregate the results yourself, but it's quite cumbersome.
Some thoughts as a side note: A nice way to avoid having to change the code to do profiling (or other debugging like breakpoints) would be to change or add hooks into the AST of the query. That way you could for example add a prof:time() around all the function bodies, with the function name as label.
And to help interpret the results of debugging functions like prof:time or trace, it would be nice to have a way to format them and output them as a stream of items (or even multiple streams). Then you could create XQuery scripts that consume the streams to aggregate the information in a human readable way.
Best regards, Iwan
Hi Iwan,
Thanks for your suggestions.
One way to both return query and profiling results via XQuery may look like this:
declare function local:profile( $code as function(*) ) as map(*) { let $ns := prof:current-ns() return map { 'VALUE': $code(), 'MS': (prof:current-ns() - $ns) idiv 1000000 } };
let $code := function() { (1 to 100000000)[. = 1] } let $result := local:profile($code) return ( "Result: " || $result?VALUE, "Time (ms): " || $result?MS )
Here is another solution, which takes advantage of the BaseX Performance class [1]:
(: import BaseX performance utility class :) import module namespace perf = 'java:org.basex.util.Performance'; (: initialize counter :) prof:void(perf:time()), (: do the expensive work :) (1 to 100000000)[. = 1], (: show spent time :) "Time (ms): " || (perf:time() div 1000000)
Both code examples cannot quite offer what you are looking for, but maybe they are helpful and serve as inspirations.
Best, Christian
[1] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
On Fri, May 19, 2017 at 11:49 PM, Iwan Briquemont tracnar@gmail.com wrote:
Thanks, I will have to try the Java profiling, it could already help.
I would think it would be easier to wrap the function bodies before the static rewrite, but in any case it will always interfere with the optimizations so it's quite hard to know what to do...
For the streaming, let's say you've added your prof:time around function bodies, right now it would output to stderr with a format like "Tag: 0.12 ms". If you could instead somehow process the result of prof:time like a normal xquery function, then you could output it to any place you want (stderr, file, database) and in the format you want. Then you could have another program to read that output and transform it.
An example with a made up option for prof:time:
declare function local:process-prof-time($result, $time, $tag) {
file:write('/tmp/profile.txt',
<profile> <timestamp>{prof:current-ns()}</timestamp> <tag>{$tag}</tag> <time>{$time}</time> </profile>
)
};
for $i in 1 to 10000
return prof:time($i * $i, false(), "Tag",
(: A callback to process the result of prof:time :)
local:process-prof-time
)
Which would make it easy to transform the profiling by reading /tmp/profile.txt, and you could also stream it to avoid creating huge output.
In this example I use a callback directly in the code but you could imagine other ways to implement it. Maybe some special debug module which can implement the callbacks.
Cheers, Iwan
Le vendredi 19 mai 2017 à 23:15 +0200, Christian Grün a écrit :
Dear Iwan,
I see you have already found the Profiling Module. Currently, it’s the only way in fact to profile your code within XQuery. Another alternative is to do some Java profiling (e.g. via -Xrunhprof), but this will only allow in general which XQuery expressions are expensive.
Thanks for your suggestions on how to improve the processing of debugging information. We could indeed think about adding a profiling flag, which will summarize the evaluation time of each function by wrapping the bodies with something like prof:time. My immediate though (among others) would be how to profile functions that will be rewritten and inlined. An example:
declare function local:inc($n) { 1 + $n }; for $i in (1, 2) let $j := local:inc($i) return $j
…will be statically rewritten to…
for $i in (1, 2) return (1 + $i)
We could probably either suppress function inlining if code is profiled, or discard profiling of inlined functions. Both approaches have their advantages and drawbacks.
I didn’t the concept of streaming the results; could you give us a little example on how this could look like, or describe this in more detail?
Thanks & cheers, Christian
I was wondering if there are ways to access how much time is spent in parts of a query/program, to find out inefficient functions or expressions? Something like the total time spent in functions would be quite useful.
I saw the Profiling module, but it requires you to change the code and it also won't aggregate results when you run it in a loop. It's still doable if you add labels and then do some parsing on stderr to aggregate the results yourself, but it's quite cumbersome.
Some thoughts as a side note: A nice way to avoid having to change the code to do profiling (or other debugging like breakpoints) would be to change or add hooks into the AST of the query. That way you could for example add a prof:time() around all the function bodies, with the function name as label.
And to help interpret the results of debugging functions like prof:time or trace, it would be nice to have a way to format them and output them as a stream of items (or even multiple streams). Then you could create XQuery scripts that consume the streams to aggregate the information in a human readable way.
Best regards, Iwan
basex-talk@mailman.uni-konstanz.de