Big recursive XQueries

List overview All Threads
Download

newer

older

Multiple producers/writers for one...

How to retrieve HTTP request...

Charles F. Munat

28 Dec 2010 28 Dec '10

10:47 p.m.

I have a BaseX backed website which stores all the pieces of the page in the DB. There is a significant amount of processing to be done. Pretty much any piece of the page can have effective or expiration dates (so that page parts can come and go on schedule). Some pages have dynamic elements, such as a Twitter feed. The page is build from objects which are wrapped in frames (for adding temporal and styling parameters), which are inserted into columns, which go into rows, which make up a page (which itself is wrapped in a template). So there is a lot of recursion.

I do the recursion using local functions.

It works well, but the page-building query is getting enormous, most of which is all these functions to insert parts of the page, often recursively.

I am wondering what the best practice would be for this. I'm working in Scala, and I could break the process up into repeated passes hitting the db with a different query for each, but it seems to me one big query will almost certainly be faster. But including all the local functions in the query makes it huge.

Is there some way to compile and load the functions once on startup, and then just run a small XQuery that calls functions which call functions, etc.?

Any ideas (or resources) for ways to optimize the query? Everything gets reused, so there is very little nesting -- mostly references are passed. For example:

Thanks! Chas.

Show replies by date

Huib Verweij

29 Dec 29 Dec

12:17 a.m.

Hi Chas,

using the following documents in the collection "Chas":

and

the following query:

declare function local:getIt($n as node()) as node() { if ($n/@ref) then local:getIt(collection('Chas')//*[node-name() eq node-name($n) and @id eq $n/@ref]) else element {node-name($n)} { $n/@*, for $child in $n/* return if ($child instance of element()) then local:getIt($child) else $child} };

local:getIt(<page ref="1"/>)

gives me this result:

Is that what you are looking for? (I've replaced the @id attributes with @ref attributes and the <id/> elements with @id attributes and fixed up the <content/> elements, but otherwise it's roughly the same.) It doesn't have all the bells and whistles yet and doesn't even check if an element can be found in the database, but these things can be added later.

Regards,

Huib Verweij.

Op 29 dec 2010, om 04:47 heeft Charles F. Munat het volgende geschreven:

...

I have a BaseX backed website which stores all the pieces of the page in the DB. There is a significant amount of processing to be done. Pretty much any piece of the page can have effective or expiration dates (so that page parts can come and go on schedule). Some pages have dynamic elements, such as a Twitter feed. The page is build from objects which are wrapped in frames (for adding temporal and styling parameters), which are inserted into columns, which go into rows, which make up a page (which itself is wrapped in a template). So there is a lot of recursion.

I do the recursion using local functions.

It works well, but the page-building query is getting enormous, most of which is all these functions to insert parts of the page, often recursively.

I am wondering what the best practice would be for this. I'm working in Scala, and I could break the process up into repeated passes hitting the db with a different query for each, but it seems to me one big query will almost certainly be faster. But including all the local functions in the query makes it huge.

Is there some way to compile and load the functions once on startup, and then just run a small XQuery that calls functions which call functions, etc.?

Any ideas (or resources) for ways to optimize the query? Everything gets reused, so there is very little nesting -- mostly references are passed. For example:

<pages> <page> <id>1</id> <rows> <row id="2"/> <row id="3"/> </rows> </page> </pages>

<rows> <row> <id>2</id> <columns> <column width="18"> <frames> <frame id="4"/> <frame id="5"/> </frames> </column> <column width="6"> <frames> <frame id="6/> </frames> </column> </columns> </row> </rows>

<frames> <frame> <id>4</id> <contents> <content id="7"/> </contents> </frame> </frames>

<contents> <content type="TEXT"/> <id>7</id> <body> Some text here. </body> </content> <content type="WIDGET"/> <id>7</id>  </content> </contents>

Thanks! Chas. _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Charles F. Munat

1:40 a.m.

Hey, Huib, this is great. One of the things I was asking was for how others would do it. I'll compare this to my own function and see what I can learn.

The other question I had was if it was possible to pre-load the embedded database with the functions, rather than having to load them all at the time the xquery is run. Sort of the way functions or triggers are preloaded into an RDBMS. Do you happen to know if that's possible?

Thanks much for the function! I'll study it carefully. It looks a bit different from my own, so I'm sure I can learn from it.

Chas.

On 12/29/2010 2:17 AM, Huib Verweij wrote:

...

Hi Chas,

using the following documents in the collection "Chas":

<pages> <pageid="1"> <rows> <rowref="2"/> <rowref="3"/> </rows> </page> </pages>

<rows> <rowid="2"> <columns> <columnwidth="18"> <frames> <frameref="4"/> <frameref="5"/> </frames> </column> <columnwidth="6"> <frames> <frameref="6"/> </frames> </column> </columns> </row> <rowid="3"> <columns> <columnwidth="18"> <frames> <frameref="4"/> <frameref="5"/> </frames> </column> </columns> </row> </rows>

<frames> <frameid="4"> <contents> <contentref="7"/> </contents> </frame> <frameid="5"> <contents> <contentref="8"/> </contents> </frame> <frameid="6"> <contents> <contentref="7"/> <contentref="8"/> </contents> </frame> </frames>

and

<contents> <contenttype="TEXT"id="7"> <body> Some text here. </body> </content> <contenttype="WIDGET"id="8">  </content> </contents>

the following query:

declare function local:getIt($n as node()) as node() { if ($n/@ref) then local:getIt(collection('Chas')//*[node-name() eq node-name($n) and @id eq $n/@ref]) else element {node-name($n)} { $n/@*, for $child in $n/* return if ($child instance of element()) then local:getIt($child) else $child} };

local:getIt(<page ref="1"/>)

gives me this result:

<pageid="1">

<rows> <rowid="2"> <columns> <columnwidth="18"> <frames> <frameid="4"> <contents> <contenttype="TEXT"id="7"> <body> </body> </content> </contents> </frame> <frameid="5"> <contents> <contenttype="WIDGET"id="8"/> </contents> </frame> </frames> </column> <columnwidth="6"> <frames> <frameid="6"> <contents> <contenttype="TEXT"id="7"> <body> </body> </content> <contenttype="WIDGET"id="8"/> </contents> </frame> </frames> </column> </columns> </row> <rowid="3"> <columns> <columnwidth="18"> <frames> <frameid="4"> <contents> <contenttype="TEXT"id="7"> <body> </body> </content> </contents> </frame> <frameid="5"> <contents> <contenttype="WIDGET"id="8"/> </contents> </frame> </frames> </column> </columns> </row> </rows> </page>

Is that what you are looking for? (I've replaced the @id attributes with @ref attributes and the <id/> elements with @id attributes and fixed up the <content/> elements, but otherwise it's roughly the same.) It doesn't have all the bells and whistles yet and doesn't even check if an element can be found in the database, but these things can be added later.

Regards,

Huib Verweij.

Op 29 dec 2010, om 04:47 heeft Charles F. Munat het volgende geschreven:

...
I have a BaseX backed website which stores all the pieces of the page in the DB. There is a significant amount of processing to be done. Pretty much any piece of the page can have effective or expiration dates (so that page parts can come and go on schedule). Some pages have dynamic elements, such as a Twitter feed. The page is build from objects which are wrapped in frames (for adding temporal and styling parameters), which are inserted into columns, which go into rows, which make up a page (which itself is wrapped in a template). So there is a lot of recursion.

I do the recursion using local functions.

It works well, but the page-building query is getting enormous, most of which is all these functions to insert parts of the page, often recursively.

I am wondering what the best practice would be for this. I'm working in Scala, and I could break the process up into repeated passes hitting the db with a different query for each, but it seems to me one big query will almost certainly be faster. But including all the local functions in the query makes it huge.

Is there some way to compile and load the functions once on startup, and then just run a small XQuery that calls functions which call functions, etc.?

Any ideas (or resources) for ways to optimize the query? Everything gets reused, so there is very little nesting -- mostly references are passed. For example:

<pages> <page> <id>1</id> <rows> <row id="2"/> <row id="3"/> </rows> </page> </pages>

<rows> <row> <id>2</id> <columns> <column width="18"> <frames> <frame id="4"/> <frame id="5"/> </frames> </column> <column width="6"> <frames> <frame id="6/> </frames> </column> </columns> </row> </rows>

<frames> <frame> <id>4</id> <contents> <content id="7"/> </contents> </frame> </frames>

<contents> <content type="TEXT"/> <id>7</id> <body> Some text here. </body> </content> <content type="WIDGET"/> <id>7</id>  </content> </contents>

Thanks! Chas. _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de mailto:BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Michael Seiferle

5:08 a.m.

Hi Chas, (and hi Huib),

thanks for your input!

At the moment preloading & compiling queries ahead of time is not possible. The main reason for this is that, from our experience, queries are usually rather short and cheap to compile.

Providing an infrastructure that takes care of precompiled queries might present quite a challenge, yet it should not be impossible.

You two probably knew this, but it might be of interest to the list as well, there's a very basic benchmark mechanism:

...

SET RUNS N

Now every query you issue is run N-times, including parsing & compiling. You will get average duration values for each step, if you enable the query info view.

This is not representative, but on my machine the 40KB functx library [1] takes an avg of ~13ms to parse and ~7ms to compile, so the overhead introduced by reparsing/recompiling should be rather low in general.

In case you experience particular problems with specific queries feel free to contact us. Often tweaking the query a little, such that possible index optimizations are recognized correctly by our compiler will speed up queries considerably.

I hope this helps, feel free to ask for more! Feedback is very welcome :-)

Kind regards Michael

Am 29.12.2010 um 07:40 schrieb Charles F. Munat:

...

The other question I had was if it was possible to pre-load the embedded database with the functions, rather than having to load them all at the time the xquery is run. Sort of the way functions or triggers are preloaded into an RDBMS. Do you happen to know if that's possible?

[1] http://www.xqueryfunctions.com/xq/download.html

Andreas Weiler

5:11 a.m.

Hi Chas,

probably the run command can help you (if you dont use it), so the whole query is stored in a textfile and the size of the query shouldnt be a problem.

run path/to/xquery.xq

Kind regards, Andreas

Am 29.12.10 11:08, schrieb Michael Seiferle:

...

Hi Chas, (and hi Huib),

thanks for your input!

At the moment preloading& compiling queries ahead of time is not possible. The main reason for this is that, from our experience, queries are usually rather short and cheap to compile.

Providing an infrastructure that takes care of precompiled queries might present quite a challenge, yet it should not be impossible.

You two probably knew this, but it might be of interest to the list as well, there's a very basic benchmark mechanism:

...
SET RUNS N

Now every query you issue is run N-times, including parsing& compiling. You will get average duration values for each step, if you enable the query info view.

This is not representative, but on my machine the 40KB functx library [1] takes an avg of ~13ms to parse and ~7ms to compile, so the overhead introduced by reparsing/recompiling should be rather low in general.

In case you experience particular problems with specific queries feel free to contact us. Often tweaking the query a little, such that possible index optimizations are recognized correctly by our compiler will speed up queries considerably.

I hope this helps, feel free to ask for more! Feedback is very welcome :-)

Kind regards Michael

Am 29.12.2010 um 07:40 schrieb Charles F. Munat:

...
The other question I had was if it was possible to pre-load the embedded database with the functions, rather than having to load them all at the time the xquery is run. Sort of the way functions or triggers are preloaded into an RDBMS. Do you happen to know if that's possible?

[1] http://www.xqueryfunctions.com/xq/download.html _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Charles F. Munat

11:32 a.m.

Good to know. I'm not worried about performance that much. I'm sure I can optimize the query(s) once I get them all figured out. What would have made the ability to store functions nice is that there are many that are reused in different queries, so currently I have to have multiple copies. Then if I make a change, I have to remember to make it to all of them. Not a big deal, just a matter of efficiency. But I can work around this on my end by storing the functions in one spot and then just inserting them programatically as necessary into the query.

When I get to optimizing, I will gladly take advantage of your offer. Huib has already given me some good ideas.

Chas.

On 12/29/2010 7:08 AM, Michael Seiferle wrote:

...

Hi Chas, (and hi Huib),

thanks for your input!

At the moment preloading& compiling queries ahead of time is not possible. The main reason for this is that, from our experience, queries are usually rather short and cheap to compile.

Providing an infrastructure that takes care of precompiled queries might present quite a challenge, yet it should not be impossible.

You two probably knew this, but it might be of interest to the list as well, there's a very basic benchmark mechanism:

...
SET RUNS N

Now every query you issue is run N-times, including parsing& compiling. You will get average duration values for each step, if you enable the query info view.

This is not representative, but on my machine the 40KB functx library [1] takes an avg of ~13ms to parse and ~7ms to compile, so the overhead introduced by reparsing/recompiling should be rather low in general.

In case you experience particular problems with specific queries feel free to contact us. Often tweaking the query a little, such that possible index optimizations are recognized correctly by our compiler will speed up queries considerably.

I hope this helps, feel free to ask for more! Feedback is very welcome :-)

Kind regards Michael

Am 29.12.2010 um 07:40 schrieb Charles F. Munat:

...
The other question I had was if it was possible to pre-load the embedded database with the functions, rather than having to load them all at the time the xquery is run. Sort of the way functions or triggers are preloaded into an RDBMS. Do you happen to know if that's possible?

[1] http://www.xqueryfunctions.com/xq/download.html

Jan Vlčinský (CAD)

11:48 a.m.

Hi It seems like client side library manager would be sort of option. Possibly with components like:

1. library storage (central? some sort of source control system, e.g. Mercurial?) 2. merger - would take "live" XQuery and merge in any function from library, which seems appropriate 1. real time one - having libraries somewhere at hand and inserting them there 1. If you would use REST access to BaseX, it could be implemented at web server as filter 2. Or as I use Python API, I would modify the library to intercept any XQuery command, and enhance the XQuery content at this place. 2. preprocessing way - building final XQuery once at the moment source code is written and before it is run multiple times

Just some ideas.

Jan

2010/12/29 Charles F. Munat charles.munat@gmail.com

...

Good to know. I'm not worried about performance that much. I'm sure I can optimize the query(s) once I get them all figured out. What would have made the ability to store functions nice is that there are many that are reused in different queries, so currently I have to have multiple copies. Then if I make a change, I have to remember to make it to all of them. Not a big deal, just a matter of efficiency. But I can work around this on my end by storing the functions in one spot and then just inserting them programatically as necessary into the query.

When I get to optimizing, I will gladly take advantage of your offer. Huib has already given me some good ideas.

Chas.

On 12/29/2010 7:08 AM, Michael Seiferle wrote:

...
Hi Chas, (and hi Huib),

thanks for your input!

At the moment preloading& compiling queries ahead of time is not possible. The main reason for this is that, from our experience, queries are usually rather short and cheap to compile.

Providing an infrastructure that takes care of precompiled queries might present quite a challenge, yet it should not be impossible.

You two probably knew this, but it might be of interest to the list as well, there's a very basic benchmark mechanism:

SET RUNS N

...
Now every query you issue is run N-times, including parsing& compiling. You will get average duration values for each step, if you enable the query info view.

This is not representative, but on my machine the 40KB functx library [1] takes an avg of ~13ms to parse and ~7ms to compile, so the overhead introduced by reparsing/recompiling should be rather low in general.

In case you experience particular problems with specific queries feel free to contact us. Often tweaking the query a little, such that possible index optimizations are recognized correctly by our compiler will speed up queries considerably.

I hope this helps, feel free to ask for more! Feedback is very welcome :-)

Kind regards Michael

Am 29.12.2010 um 07:40 schrieb Charles F. Munat:

The other question I had was if it was possible to pre-load the embedded

...
database with the functions, rather than having to load them all at the time the xquery is run. Sort of the way functions or triggers are preloaded into an RDBMS. Do you happen to know if that's possible?

[1] http://www.xqueryfunctions.com/xq/download.html

BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

-- *Ing. Jan Vlčinský* CAD programy Slunečnicová 338/3, 734 01 Karviná Ráj, Czech Republic tel: +420-597 602 024; mob: +420-608 979 040 skype: janvlcinsky; GoogleTalk: jan.vlcinsky@gmail.com http://cz.linkedin.com/in/vlcinsky

Charles F. Munat

12:30 p.m.

I'm currently using 2.2. Works fine, but I'd like to chop the queries into reusable pieces. Will try that soon.

Chas.

On 12/29/2010 1:48 PM, Jan Vlčinský (CAD) wrote:

...

Hi It seems like client side library manager would be sort of option. Possibly with components like:

library storage (central? some sort of source control system, e.g. Mercurial?)
merger - would take "live" XQuery and merge in any function from library, which seems appropriate
1. real time one - having libraries somewhere at hand and inserting them there
  1. If you would use REST access to BaseX, it could be implemented at web server as filter
  2. Or as I use Python API, I would modify the library to intercept any XQuery command, and enhance the XQuery content at this place.
2. preprocessing way - building final XQuery once at the moment source code is written and before it is run multiple times

Just some ideas.

Jan

2010/12/29 Charles F. Munat <charles.munat@gmail.com mailto:charles.munat@gmail.com>

Good to know. I'm not worried about performance that much. I'm
sure I can optimize the query(s) once I get them all figured out.
What would have made the ability to store functions nice is that
there are many that are reused in different queries, so currently
I have to have multiple copies. Then if I make a change, I have to
remember to make it to all of them. Not a big deal, just a matter
of efficiency. But I can work around this on my end by storing the
functions in one spot and then just inserting them programatically
as necessary into the query.

When I get to optimizing, I will gladly take advantage of your
offer. Huib has already given me some good ideas.

Chas.


On 12/29/2010 7:08 AM, Michael Seiferle wrote:

    Hi Chas,
    (and hi  Huib),

    thanks for your input!

    At the moment preloading&  compiling queries ahead of time is
    not possible.
    The main reason for this is that, from our experience, queries
    are usually rather short and cheap to compile.

    Providing an infrastructure that takes care of precompiled
    queries might present quite a challenge, yet it should not be
    impossible.

    You two probably knew this, but it might be of interest to the
    list as well,  there's a very basic benchmark mechanism:

        SET RUNS N

    Now every query you issue is run N-times, including parsing&
     compiling. You will get average duration values for each
    step, if you enable the query info view.

    This is not representative, but on my machine the 40KB functx
    library [1] takes an avg of ~13ms to parse and ~7ms to compile,
    so the overhead introduced by reparsing/recompiling should be
    rather low in general.

    In case you experience particular problems with specific
    queries feel free to contact us.
    Often tweaking the query a little, such that possible index
    optimizations are recognized correctly by our compiler will
    speed up queries considerably.

    I hope this helps, feel free to ask for more! Feedback is very
    welcome :-)

    Kind regards
    Michael


    Am 29.12.2010 um 07:40 schrieb Charles F. Munat:

        The other question I had was if it was possible to
        pre-load the embedded database with the functions, rather
        than having to load them all at the time the xquery is
        run. Sort of the way functions or triggers are preloaded
        into an RDBMS. Do you happen to know if that's possible?

    [1] http://www.xqueryfunctions.com/xq/download.html

_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
<mailto:BaseX-Talk@mailman.uni-konstanz.de>
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Mark Boon

8:48 p.m.

A little while ago I asked something similar. Even when it's not a performance concern, then at least it very quickly gets to be a developer's concern when each query is lead by a huge amount of unused functions.

What I did was the following, functions are stored in a function element:

<function name='ar:A'> declare function ar:A ( $arg as xs:string ) as xs:string { let $result := collection()//variable[@name=$arg]/@value return ( if ($result) then $result/string() else fn:concat('"',fn:concat($arg,' not found"')) ) } </function> <function name='ar:B> <use name="ar:A"/> declare function ar:B ( $arg as xs:string ) as xs:string { ar:A($arg) } </function>

and then when I call it:

and then I have an XSLT script that transforms the above by replacing the <use> element with the text-node of the <function> element with the same name at the time the XML file is read in. This is just an unrealistically simple example of course. It works for me because my queries are stored in an XML file. Otherwise you may have to invent a similar method for your particular need.

You may also want to check out the improved error-reporting I posted a few threads back.

Mark Boon

(I hope GMail didn't mangle the formatting too much)

On Wed, Dec 29, 2010 at 6:32 AM, Charles F. Munat charles.munat@gmail.com wrote:

...

Good to know. I'm not worried about performance that much. I'm sure I can optimize the query(s) once I get them all figured out. What would have made the ability to store functions nice is that there are many that are reused in different queries, so currently I have to have multiple copies. Then if I make a change, I have to remember to make it to all of them. Not a big deal, just a matter of efficiency. But I can work around this on my end by storing the functions in one spot and then just inserting them programatically as necessary into the query.

When I get to optimizing, I will gladly take advantage of your offer. Huib has already given me some good ideas.

Chas.

On 12/29/2010 7:08 AM, Michael Seiferle wrote:

...
Hi Chas, (and hi Huib),

thanks for your input!

At the moment preloading& compiling queries ahead of time is not possible. The main reason for this is that, from our experience, queries are usually rather short and cheap to compile.

Providing an infrastructure that takes care of precompiled queries might present quite a challenge, yet it should not be impossible.

You two probably knew this, but it might be of interest to the list as well, there's a very basic benchmark mechanism:

...
SET RUNS N

Now every query you issue is run N-times, including parsing& compiling. You will get average duration values for each step, if you enable the query info view.

This is not representative, but on my machine the 40KB functx library [1] takes an avg of ~13ms to parse and ~7ms to compile, so the overhead introduced by reparsing/recompiling should be rather low in general.

In case you experience particular problems with specific queries feel free to contact us. Often tweaking the query a little, such that possible index optimizations are recognized correctly by our compiler will speed up queries considerably.

I hope this helps, feel free to ask for more! Feedback is very welcome :-)

Kind regards Michael

Am 29.12.2010 um 07:40 schrieb Charles F. Munat:

...
The other question I had was if it was possible to pre-load the embedded database with the functions, rather than having to load them all at the time the xquery is run. Sort of the way functions or triggers are preloaded into an RDBMS. Do you happen to know if that's possible?

[1] http://www.xqueryfunctions.com/xq/download.html

BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Charles F. Munat

10:28 p.m.

This is an interesting idea, if a bit complex. I'll try it out, but I can also simply create the various functions in the business layer, and then assemble queries as necessary.

Thanks!

Chas.

On 12/29/2010 10:48 PM, Mark Boon wrote:

...

A little while ago I asked something similar. Even when it's not a performance concern, then at least it very quickly gets to be a developer's concern when each query is lead by a huge amount of unused functions.

What I did was the following, functions are stored in a function element:
<function name='ar:A'>
	declare function ar:A ( $arg as xs:string ) as xs:string
	{
		let $result := collection()//variable[@name=$arg]/@value
		return ( if ($result) then $result/string() else
fn:concat('"',fn:concat($arg,' not found"')) ) } </function> <function name='ar:B> <use name="ar:A"/> declare function ar:B ( $arg as xs:string ) as xs:string { ar:A($arg) } </function>

and then when I call it:
	<query>
		<use name="ar:B"/>
		ar:B('test')
</query>
and then I have an XSLT script that transforms the above by replacing the<use> element with the text-node of the<function> element with the same name at the time the XML file is read in. This is just an unrealistically simple example of course. It works for me because my queries are stored in an XML file. Otherwise you may have to invent a similar method for your particular need.

You may also want to check out the improved error-reporting I posted a few threads back.
 Mark Boon
(I hope GMail didn't mangle the formatting too much)

On Wed, Dec 29, 2010 at 6:32 AM, Charles F. Munat charles.munat@gmail.com wrote:

...
Good to know. I'm not worried about performance that much. I'm sure I can optimize the query(s) once I get them all figured out. What would have made the ability to store functions nice is that there are many that are reused in different queries, so currently I have to have multiple copies. Then if I make a change, I have to remember to make it to all of them. Not a big deal, just a matter of efficiency. But I can work around this on my end by storing the functions in one spot and then just inserting them programatically as necessary into the query.

When I get to optimizing, I will gladly take advantage of your offer. Huib has already given me some good ideas.

Chas.

On 12/29/2010 7:08 AM, Michael Seiferle wrote:

...
Hi Chas, (and hi Huib),

thanks for your input!

At the moment preloading& compiling queries ahead of time is not possible. The main reason for this is that, from our experience, queries are usually rather short and cheap to compile.

Providing an infrastructure that takes care of precompiled queries might present quite a challenge, yet it should not be impossible.

You two probably knew this, but it might be of interest to the list as well, there's a very basic benchmark mechanism:

...
SET RUNS N

Now every query you issue is run N-times, including parsing& compiling. You will get average duration values for each step, if you enable the query info view.

This is not representative, but on my machine the 40KB functx library [1] takes an avg of ~13ms to parse and ~7ms to compile, so the overhead introduced by reparsing/recompiling should be rather low in general.

In case you experience particular problems with specific queries feel free to contact us. Often tweaking the query a little, such that possible index optimizations are recognized correctly by our compiler will speed up queries considerably.

I hope this helps, feel free to ask for more! Feedback is very welcome :-)

Kind regards Michael

Am 29.12.2010 um 07:40 schrieb Charles F. Munat:

...
The other question I had was if it was possible to pre-load the embedded database with the functions, rather than having to load them all at the time the xquery is run. Sort of the way functions or triggers are preloaded into an RDBMS. Do you happen to know if that's possible?

[1] http://www.xqueryfunctions.com/xq/download.html

BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Huib Verweij

31 Dec 31 Dec

3:07 a.m.

Op 30 dec 2010, om 04:28 heeft Charles F. Munat het volgende geschreven:

...

This is an interesting idea, if a bit complex.

Mark's idea does not seem too complex to me, I think it depends on the environment you're working in. I use a similar trick, storing the XQueries in XSLT templates and calling these when needed. For example:

<xsl:template name="auth"> declare function {...}; </xsl:template>

and then when compiling the query using XSLT:

<query> declare namespace ... ; <xsl:call-template name="auth"/> </query>

and then I send it off to the XML database (eXist or BaseX) using HTTP. Using Cocoon this is very easy and it gives me a lot of control and no maintenance costs (compared to managing stored queries f.i). There are drawbacks of course, not the least of which is that oXygen doesn't recognize my XQueries stored inside a XSLT stylesheet and doesn't do syntax highlighting etc.

As was said earlier, I wouldn't worry too much about creating large XQueries, the compiler is fast and as time goes by and the dataset grows most time will be spent executing the query. OTOH, sometimes every millisecond counts.

Huib.

Charles F. Munat

29 Dec 29 Dec

1:56 a.m.

This is nice. I like the changes you made. @ref makes more sense, and then moving the id element to an attribute is still clear. Thanks! Very nice.

Chas.

On 12/29/2010 2:17 AM, Huib Verweij wrote:

...

Hi Chas,

using the following documents in the collection "Chas":

<pages> <pageid="1"> <rows> <rowref="2"/> <rowref="3"/> </rows> </page> </pages>

<rows> <rowid="2"> <columns> <columnwidth="18"> <frames> <frameref="4"/> <frameref="5"/> </frames> </column> <columnwidth="6"> <frames> <frameref="6"/> </frames> </column> </columns> </row> <rowid="3"> <columns> <columnwidth="18"> <frames> <frameref="4"/> <frameref="5"/> </frames> </column> </columns> </row> </rows>

<frames> <frameid="4"> <contents> <contentref="7"/> </contents> </frame> <frameid="5"> <contents> <contentref="8"/> </contents> </frame> <frameid="6"> <contents> <contentref="7"/> <contentref="8"/> </contents> </frame> </frames>

and

<contents> <contenttype="TEXT"id="7"> <body> Some text here. </body> </content> <contenttype="WIDGET"id="8">  </content> </contents>

the following query:

declare function local:getIt($n as node()) as node() { if ($n/@ref) then local:getIt(collection('Chas')//*[node-name() eq node-name($n) and @id eq $n/@ref]) else element {node-name($n)} { $n/@*, for $child in $n/* return if ($child instance of element()) then local:getIt($child) else $child} };

local:getIt(<page ref="1"/>)

gives me this result:

<pageid="1">

<rows> <rowid="2"> <columns> <columnwidth="18"> <frames> <frameid="4"> <contents> <contenttype="TEXT"id="7"> <body> </body> </content> </contents> </frame> <frameid="5"> <contents> <contenttype="WIDGET"id="8"/> </contents> </frame> </frames> </column> <columnwidth="6"> <frames> <frameid="6"> <contents> <contenttype="TEXT"id="7"> <body> </body> </content> <contenttype="WIDGET"id="8"/> </contents> </frame> </frames> </column> </columns> </row> <rowid="3"> <columns> <columnwidth="18"> <frames> <frameid="4"> <contents> <contenttype="TEXT"id="7"> <body> </body> </content> </contents> </frame> <frameid="5"> <contents> <contenttype="WIDGET"id="8"/> </contents> </frame> </frames> </column> </columns> </row> </rows> </page>

Is that what you are looking for? (I've replaced the @id attributes with @ref attributes and the <id/> elements with @id attributes and fixed up the <content/> elements, but otherwise it's roughly the same.) It doesn't have all the bells and whistles yet and doesn't even check if an element can be found in the database, but these things can be added later.

Regards,

Huib Verweij.

Op 29 dec 2010, om 04:47 heeft Charles F. Munat het volgende geschreven:

...
I have a BaseX backed website which stores all the pieces of the page in the DB. There is a significant amount of processing to be done. Pretty much any piece of the page can have effective or expiration dates (so that page parts can come and go on schedule). Some pages have dynamic elements, such as a Twitter feed. The page is build from objects which are wrapped in frames (for adding temporal and styling parameters), which are inserted into columns, which go into rows, which make up a page (which itself is wrapped in a template). So there is a lot of recursion.

I do the recursion using local functions.

It works well, but the page-building query is getting enormous, most of which is all these functions to insert parts of the page, often recursively.

I am wondering what the best practice would be for this. I'm working in Scala, and I could break the process up into repeated passes hitting the db with a different query for each, but it seems to me one big query will almost certainly be faster. But including all the local functions in the query makes it huge.

Is there some way to compile and load the functions once on startup, and then just run a small XQuery that calls functions which call functions, etc.?

Any ideas (or resources) for ways to optimize the query? Everything gets reused, so there is very little nesting -- mostly references are passed. For example:

<pages> <page> <id>1</id> <rows> <row id="2"/> <row id="3"/> </rows> </page> </pages>

<rows> <row> <id>2</id> <columns> <column width="18"> <frames> <frame id="4"/> <frame id="5"/> </frames> </column> <column width="6"> <frames> <frame id="6/> </frames> </column> </columns> </row> </rows>

<frames> <frame> <id>4</id> <contents> <content id="7"/> </contents> </frame> </frames>

<contents> <content type="TEXT"/> <id>7</id> <body> Some text here. </body> </content> <content type="WIDGET"/> <id>7</id>  </content> </contents>

Thanks! Chas. _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de mailto:BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

5312

Age (days ago)

5314

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

11 comments

6 participants

tags (0)

participants (6)

Andreas Weiler
Charles F. Munat
Huib Verweij
Jan Vlčinský (CAD)
Mark Boon
Michael Seiferle