Hello, we are experiencing a weird performance issue in function calling. In many xqueries we use, we found that calling functions slows down the execution, sometimes it even hangs "forever". I tried to drill down to the problem and I think I isolated an odd behavior. Before presenting the example code, let me summarize a little bit: inside a for-loop we call a thousand times a function passing some parameters. I declared 3 functions, 2 of which take 1 parameter and the last one takes 2 parameters. For simplicity's sake the parameter "objectId" is never used, but that doesn't affect the experiment. When I call any of the 1-parameter functions, execution takes less than 2 seconds, but when I call the 2-parameter function it runs endlessly. It seems that the combination of those two parameters is the point of performance issue. I hope you can illuminate me. Regards William
Here it is an example:
declare namespace xbpr = "http://www.bpeng.com/";
declare variable $amlDocName as xs:string :="aml"; declare variable $cxnDefsDocName as xs:string :="cxnDefsDocName";
declare function xbpr:testFast1($objectId as xs:string?){ let $docName := $cxnDefsDocName let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
declare function xbpr:testFast2($docName as xs:string?){ let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
declare function xbpr:testSlow($docName as xs:string?, $objectId as xs:string?){ let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
let $instances := doc($amlDocName)/AML/Group//ObjDef for $instance in $instances let $objectId := $instance/@ObjDef.ID let $v := xbpr:testFast1($objectId) let $v := xbpr:testFast2($cxnDefsDocName) let $v := xbpr:testSlow($cxnDefsDocName, $objectId) return $v
William,
thanks for your e-mail. I didn't manage to reproduce the behavior yet, so it would be great if you could provide us/me with a fully working example (e.g., including the sample documents).
Best, Christian
On Tue, Jan 11, 2011 at 3:34 PM, William Sandri wsandri@bpeng.com wrote:
Hello, we are experiencing a weird performance issue in function calling. In many xqueries we use, we found that calling functions slows down the execution, sometimes it even hangs "forever". I tried to drill down to the problem and I think I isolated an odd behavior. Before presenting the example code, let me summarize a little bit: inside a for-loop we call a thousand times a function passing some parameters. I declared 3 functions, 2 of which take 1 parameter and the last one takes 2 parameters. For simplicity's sake the parameter "objectId" is never used, but that doesn't affect the experiment. When I call any of the 1-parameter functions, execution takes less than 2 seconds, but when I call the 2-parameter function it runs endlessly. It seems that the combination of those two parameters is the point of performance issue. I hope you can illuminate me. Regards William
Here it is an example:
declare namespace xbpr = "http://www.bpeng.com/";
declare variable $amlDocName as xs:string :="aml"; declare variable $cxnDefsDocName as xs:string :="cxnDefsDocName";
declare function xbpr:testFast1($objectId as xs:string?){ let $docName := $cxnDefsDocName let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
declare function xbpr:testFast2($docName as xs:string?){ let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
declare function xbpr:testSlow($docName as xs:string?, $objectId as xs:string?){ let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
let $instances := doc($amlDocName)/AML/Group//ObjDef for $instance in $instances let $objectId := $instance/@ObjDef.ID let $v := xbpr:testFast1($objectId) let $v := xbpr:testFast2($cxnDefsDocName) let $v := xbpr:testSlow($cxnDefsDocName, $objectId) return $v
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi William,
thanks for providing me with the sample documents. Unfortunately, I cannot give you a general answer on how to avoid the slowdown of your query, as appears to be the result of numerous single optimizations (flwor rewritings, static bindings, functions inlinings, etc.), which are performed in the compilation step. You might check out the Query Info in the GUI, or use -V on command line, to get some more insight into the query compilation process.
After all, I have added your use case to our internal bug tracker, and I might have a closer look at this phenomena after the next official release.
Hope this helps, Christian
On Tue, Jan 11, 2011 at 3:34 PM, William Sandri wsandri@bpeng.com wrote:
Hello, we are experiencing a weird performance issue in function calling. In many xqueries we use, we found that calling functions slows down the execution, sometimes it even hangs "forever". I tried to drill down to the problem and I think I isolated an odd behavior. Before presenting the example code, let me summarize a little bit: inside a for-loop we call a thousand times a function passing some parameters. I declared 3 functions, 2 of which take 1 parameter and the last one takes 2 parameters. For simplicity's sake the parameter "objectId" is never used, but that doesn't affect the experiment. When I call any of the 1-parameter functions, execution takes less than 2 seconds, but when I call the 2-parameter function it runs endlessly. It seems that the combination of those two parameters is the point of performance issue. I hope you can illuminate me. Regards William
Here it is an example:
declare namespace xbpr = "http://www.bpeng.com/";
declare variable $amlDocName as xs:string :="aml"; declare variable $cxnDefsDocName as xs:string :="cxnDefsDocName";
declare function xbpr:testFast1($objectId as xs:string?){ let $docName := $cxnDefsDocName let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
declare function xbpr:testFast2($docName as xs:string?){ let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
declare function xbpr:testSlow($docName as xs:string?, $objectId as xs:string?){ let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
let $instances := doc($amlDocName)/AML/Group//ObjDef for $instance in $instances let $objectId := $instance/@ObjDef.ID let $v := xbpr:testFast1($objectId) let $v := xbpr:testFast2($cxnDefsDocName) let $v := xbpr:testSlow($cxnDefsDocName, $objectId) return $v
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Thank you Christian, I already took a look at the query info but I obviously cannot say what it is going on in the compilation process. I hope you'll have time to work on this issue because it is a huge problem for us. We are migrating our product from Berkeley Xml DB to BaseX because of overall best performances, but issues like this one scare me a little.
Regards William
On 01/11/2011 08:45 PM, Christian Grün wrote:
Hi William,
thanks for providing me with the sample documents. Unfortunately, I cannot give you a general answer on how to avoid the slowdown of your query, as appears to be the result of numerous single optimizations (flwor rewritings, static bindings, functions inlinings, etc.), which are performed in the compilation step. You might check out the Query Info in the GUI, or use -V on command line, to get some more insight into the query compilation process.
After all, I have added your use case to our internal bug tracker, and I might have a closer look at this phenomena after the next official release.
Hope this helps, Christian
On Tue, Jan 11, 2011 at 3:34 PM, William Sandri wrote:
Hello, we are experiencing a weird performance issue in function calling. In many xqueries we use, we found that calling functions slows down the execution, sometimes it even hangs "forever". I tried to drill down to the problem and I think I isolated an odd behavior. Before presenting the example code, let me summarize a little bit: inside a for-loop we call a thousand times a function passing some parameters. I declared 3 functions, 2 of which take 1 parameter and the last one takes 2 parameters. For simplicity's sake the parameter "objectId" is never used, but that doesn't affect the experiment. When I call any of the 1-parameter functions, execution takes less than 2 seconds, but when I call the 2-parameter function it runs endlessly. It seems that the combination of those two parameters is the point of performance issue. I hope you can illuminate me. Regards William
Here it is an example:
declare namespace xbpr = "http://www.bpeng.com/";
declare variable $amlDocName as xs:string :="aml"; declare variable $cxnDefsDocName as xs:string :="cxnDefsDocName";
declare function xbpr:testFast1($objectId as xs:string?){ let $docName := $cxnDefsDocName let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
declare function xbpr:testFast2($docName as xs:string?){ let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
declare function xbpr:testSlow($docName as xs:string?, $objectId as xs:string?){ let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
let $instances := doc($amlDocName)/AML/Group//ObjDef for $instance in $instances let $objectId := $instance/@ObjDef.ID let $v := xbpr:testFast1($objectId) let $v := xbpr:testFast2($cxnDefsDocName) let $v := xbpr:testSlow($cxnDefsDocName, $objectId) return $v
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi William,
Am 12.01.2011 09:22, schrieb William Sandri:
We are migrating our product from Berkeley Xml DB to BaseX because of overall best performances
nice to hear!
I already took a look at the query info but I obviously cannot say what it is going on in the compilation process.
I looked into your query and I think I figured out what's going on:
In BaseX there are several optimization rules for preevaluating constant subexpressions. The relevant ones are: - Function calls are evaluated if all arguments are evaluated - FLOWR constructs are eliminated if they don't use iteration
This leads to optimization of the first two functions, but not the last one:
declare function xbpr:testFast1($objectId as xs:string?){ let $docName := $cxnDefsDocName let $conn_typeof_target := doc($docName)/CxnDefs /CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
Here the argument is dynamic, but it isn't used in the function body. So the function body is preevaluated and calling the function is nearly free performance-wise.
The second function
declare function xbpr:testFast2($docName as xs:string?){ let $conn_typeof_target := doc($docName)/CxnDefs /CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
is called only with constant arguments, so it can be preevaluated completely.
The slow function
declare function xbpr:testSlow($docName as xs:string?, $objectId as xs:string?){ let $conn_typeof_target := doc($docName)/CxnDef /CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
prevents both optimizations as it a) is called with the dynamic argument $objectId (which it ignores) b) depends on it's arguments, so the body cant be evaluated in advance
We'll refine the optimization of functions so that they are preevaluated if all arguments are either constant or ignored. Until then you could simply remove unused arguments from your functions and/or separate constant subexpressions in other ways to help the optimizer.
I hope this helps, cheers Leo
Hello, unfortunately our queries are far from being simple and they require all parameters we pass. What I provided it's a mere trivial example of what we are facing. Anyway, regarding function testFast1, what if I use the dynamic argument I pass, like:
declare function xbpr:testFast1($objectId as xs:string?){ let $docName := $cxnDefsDocName let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType" *and @source=$objectId*]/@target return $conn_typeof_target };
I thought it should fall into the same situation like testSlow but it does not, it still runs fast.
William
On 01/12/2011 10:43 AM, Leonard Wörteler wrote:
Hi William,
Am 12.01.2011 09:22, schrieb William Sandri:
We are migrating our product from Berkeley Xml DB to BaseX because of overall best performances
nice to hear!
I already took a look at the query info but I obviously cannot say what it is going on in the compilation process.
I looked into your query and I think I figured out what's going on:
In BaseX there are several optimization rules for preevaluating constant subexpressions. The relevant ones are:
- Function calls are evaluated if all arguments are evaluated
- FLOWR constructs are eliminated if they don't use iteration
This leads to optimization of the first two functions, but not the last one:
declare function xbpr:testFast1($objectId as xs:string?){ let $docName := $cxnDefsDocName let $conn_typeof_target := doc($docName)/CxnDefs /CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
Here the argument is dynamic, but it isn't used in the function body. So the function body is preevaluated and calling the function is nearly free performance-wise.
The second function
declare function xbpr:testFast2($docName as xs:string?){ let $conn_typeof_target := doc($docName)/CxnDefs /CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
is called only with constant arguments, so it can be preevaluated completely.
The slow function
declare function xbpr:testSlow($docName as xs:string?, $objectId as xs:string?){ let $conn_typeof_target := doc($docName)/CxnDef /CxnDef[@type="RdfsType"]/@target return $conn_typeof_target };
prevents both optimizations as it a) is called with the dynamic argument $objectId (which it ignores) b) depends on it's arguments, so the body cant be evaluated in advance
We'll refine the optimization of functions so that they are preevaluated if all arguments are either constant or ignored. Until then you could simply remove unused arguments from your functions and/or separate constant subexpressions in other ways to help the optimizer.
I hope this helps, cheers Leo
Hi,
Am 12.01.2011 12:04, schrieb William Sandri:
Anyway, regarding function testFast1, what if I use the dynamic argument I pass, like:
declare function xbpr:testFast1($objectId as xs:string?){ let $docName := $cxnDefsDocName let $conn_typeof_target := doc($docName)/CxnDefs/CxnDef[@type="RdfsType" *and @source=$objectId*]/@target return $conn_typeof_target };
wow, you are really finding all the special cases ;-).
I thought it should fall into the same situation like testSlow but it does not, it still runs fast.
It does, in the sense that the function body isn't constant and thus can't be evaluated completely. It's another optimization that saves the day here.
As $cxnDefsDocName (and by implication $docName) is a compile-time constant, the doc(...)-function can be evaluated. This makes all indices of the database available to the optimizer, resulting in this definition:
xbpr:testFast1($objectId as xs:string?) { (IndexAccess("cxn", $objectId, ATTRIBUTE)/self::source /parent::CxnDef INTERSECT IndexAccess("cxn", "RdfsType", ATTRIBUTE)/self::type /parent::CxnDef )/.[parent::CxnDefs/parent::document-node()]/@target };
That's obviously much faster than an iterative evaluation strategy.
If you're working on a limited and statically known set of documents, then it would probably be most efficient to bind all documents to static variables and then access them via those:
declare namespace xbpr = "http://www.bpeng.com/";
declare variable $amlDoc as node()* := doc("aml"); declare variable $cxnDefsDoc as node()* := doc("cxn"); (: ...the query... :)
Optimization in BaseX ist pretty complex in general and it often helps to experiment with different reformulations of the same query to help the optimizer.
I hope this clarifies it a bit, feel free to ask if anything's still unclear.
Cheers Leo
basex-talk@mailman.uni-konstanz.de