Thanks Christian,

I changed the query from "transaction/* except (/transaction/traInfo)" to "transaction/*[name() ne 'traInfo']" as suggested by you. later one takes 13 secs to complete while former was taking 21 secs.

LOOP

while((item = itr.next()) != null){

if(count >= start)

System.out.println(/*item.serialize()*/);

count++;

if(count > end)

break;

}

Former Query : "transaction/* except (/transaction/traInfo)"

Loop (executed with java api) takes 18-19 seconds

QueryInfo :

Timing:

- Parsing: 1.02 ms

- Compiling: 3.53 ms

- Evaluating: 21157.98 ms

- Printing: 144.41 ms

- Total Time: 21306.94 ms

Later Query (suggested by you) "transaction/*[name() ne 'traInfo']"

Loop (executed with java api) takes 0.5 secs

Query Info :

Timing:

- Parsing: 1.0 ms

- Compiling: 3.46 ms

- Evaluating: 15469.8 ms

- Printing: 56.87 ms

- Total Time: 15531.14 ms

Query will return around 94 lacks items.

So, I was wondering apart from query change, is there any baseX tuning or configuration changes should I do to further improve time from 13 secs.

meanwhile I tried to run the same query on a high end machine ( 64 GB RAM, 8 cores Linux machine).

BaseX was started with -Xmx32g.

I didn't find any improvement in execution time.

I am also attaching -Xrunhprof:cpu for

former query = java.hprof_former_query.txt

later query = java.hprof_later_query.txt

Hope above information suffices you.

Thanks

On Fri, Aug 22, 2014 at 5:01 PM, Christian Grün <christian.gruen@gmail.com> wrote:

> It would be great if you can explain me why the query and loop took so much
> time earlier and now it gets completed quickly.

In your original query...

transaction/* except (/transaction/traInfo)

...the second path expression was evaluated for each result of the
first expression. While it would theoretically be possible for the
query processor to cache the results of the second expression, it is
difficult in practice to decide when this is reasonable. Beside that,
you have a potentially large number of sets that need to be compared
every time, resulting in max. n*n comparisons (or O(n²)). The
following query (which is probably the one you chose?) will always be
linear:

transaction/*[name() ne 'traInfo']

> Apart from this I have one concern that in my application XQueries will be
> provided by end users.
> So, every time I wouldn't be able to change or optimize the query.

It is hardly possible to limit queries of end users to those that are
fast enough to be processed in a given time. As an example, the
following query will take hours or even days to compute, even if it
looks that simple:

(1 to 10000000000)[. = 0]

However, you can limit evaluation time and memory consumption, as
described here:

http://docs.basex.org/wiki/XQuery_Module#xquery:eval

> does BaseX use any query optimizer or can you suggest me any external tool
> / lib for the same ?

BaseX would be pretty much worthless without query optimizer, so I
don't quite get what you mean by that?

Christian

Kunal Chauhan

mail4ck@gmail.com

[+918655517141]