Thanks Christian,
I changed the query from *"transaction/* except (/transaction/traInfo)" *to* "**transaction/*[name() ne 'traInfo']**" *as suggested by you. later one takes *13 secs* to complete while former was taking* 21 secs.*
*LOOP * while((item = itr.next()) != null){ if(count >= start) System.out.println(/*item.serialize()*/); count++; if(count > end) break;
}
*Former Query : "transaction/* except (/transaction/traInfo)"*
*Loop (executed with java api) ** takes 18-19 seconds*
*QueryInfo : *
Timing: - Parsing: 1.02 ms - Compiling: 3.53 ms - Evaluating: 21157.98 ms - Printing: 144.41 ms - Total Time: 21306.94 ms
*Later Query (suggested by you) **"**transaction/*[name() ne 'traInfo']* *"*
*Loop (executed with java api) takes 0.5 secs*
*Query Info :*
Timing: - Parsing: 1.0 ms - Compiling: 3.46 ms - Evaluating: 15469.8 ms - Printing: 56.87 ms - Total Time: 15531.14 ms
*Query will return around 94 lacks items.*
So, I was wondering apart from query change, is there any baseX tuning or configuration changes should I do to further improve time from 13 secs.
meanwhile I tried to run the same query on a high end machine ( 64 GB RAM, 8 cores Linux machine). BaseX was started with -Xmx32g. I didn't find any improvement in execution time.
I am also attaching -Xrunhprof:cpu for former query = java.hprof_former_query.txt later query = java.hprof_later_query.txt
Hope above information suffices you. Thanks
On Fri, Aug 22, 2014 at 5:01 PM, Christian Grün christian.gruen@gmail.com wrote:
It would be great if you can explain me why the query and loop took so
much
time earlier and now it gets completed quickly.
In your original query...
transaction/* except (/transaction/traInfo)
...the second path expression was evaluated for each result of the first expression. While it would theoretically be possible for the query processor to cache the results of the second expression, it is difficult in practice to decide when this is reasonable. Beside that, you have a potentially large number of sets that need to be compared every time, resulting in max. n*n comparisons (or O(n²)). The following query (which is probably the one you chose?) will always be linear:
transaction/*[name() ne 'traInfo']
Apart from this I have one concern that in my application XQueries will
be
provided by end users. So, every time I wouldn't be able to change or optimize the query.
It is hardly possible to limit queries of end users to those that are fast enough to be processed in a given time. As an example, the following query will take hours or even days to compute, even if it looks that simple:
(1 to 10000000000)[. = 0]
However, you can limit evaluation time and memory consumption, as described here:
http://docs.basex.org/wiki/XQuery_Module#xquery:eval
does BaseX use any query optimizer or can you suggest me any external
tool
/ lib for the same ?
BaseX would be pretty much worthless without query optimizer, so I don't quite get what you mean by that?
Christian