Is this an inlining/query compilation issue?

List overview All Threads
Download

newer

older

small issue with map:put()

Re: [basex-talk] BaseX and MySQL

Marc van Grootel

16 Nov 2014 16 Nov '14

9:40 a.m.

Hi,

I found that with the following code the trace shows that, for the two functions (FN) created, both have gotten "ul" inlined. I expected the first to get "ul" and the second "li" and that is also what xpath-matches() receives (X).

declare function local:select($selectors as item()*) as function(node()*) as node()* { let $fns := for $selector in $selectors return if ($selector instance of xs:string) then trace(local:xpath-matches(trace($selector,'X: ')),'FN: ') else $selector return function($nodes) { fold-left($fns, $nodes, function($nodes, $fn) { $fn($nodes) } ) } };

declare function local:xpath-matches($selector as xs:string) { function($node as node()*) as node()* { xquery:eval($selector, map { '': $node }) } };

local:select(('ul','li'))(<ul><li>item</li></ul>)

--Marc

Show replies by date

Marc van Grootel

16 Nov 16 Nov

9:43 a.m.

... here are the trace messages

- X: ul - FN: function($node_17 as node()*) as node()* { ((: node()*, true :) xquery:eval("ul", { "":$node_17 })) } - X: li - FN: function($node_17 as node()*) as node()* { ((: node()*, true :) xquery:eval("ul", { "":$node_17 })) }

--Marc

Christian Grün

11:05 a.m.

Hi Marc,

This is what I get with the current snapshot:

- X: ul - FN: function($node_17 as node()*) as node()* { ((: node()*, true :) let $selector_18 := "ul" return xquery:eval($selector_18, { "":$node_17 })) } - X: li - FN: function($node_17 as node()*) as node()* { ((: node()*, true :) let $selector_18 := "li" return xquery:eval($selector_18, { "":$node_17 })) }

Did you use one of the more recent snapshots? Christian

On Sun, Nov 16, 2014 at 3:43 PM, Marc van Grootel marc.van.grootel@gmail.com wrote:

...

... here are the trace messages

X: ul

FN: function($node_17 as node()*) as node()* { ((: node()*, true :)

xquery:eval("ul", { "":$node_17 })) }

X: li

FN: function($node_17 as node()*) as node()* { ((: node()*, true :)

xquery:eval("ul", { "":$node_17 })) }

--Marc

Marc van Grootel

11:31 a.m.

Hi Christian,

Duh, you're right. I didn't start the new basexgui after I replaced with your latest snapshot. command-line used the newer snapshot, basexgui still the old one.

--Marc

On Sun, Nov 16, 2014 at 5:05 PM, Christian Grün christian.gruen@gmail.com wrote:

...

Hi Marc,

This is what I get with the current snapshot:

X: ul

FN: function($node_17 as node()*) as node()* { ((: node()*, true :)

let $selector_18 := "ul" return xquery:eval($selector_18, { "":$node_17 })) }

X: li

FN: function($node_17 as node()*) as node()* { ((: node()*, true :)

let $selector_18 := "li" return xquery:eval($selector_18, { "":$node_17 })) }

Did you use one of the more recent snapshots? Christian

On Sun, Nov 16, 2014 at 3:43 PM, Marc van Grootel marc.van.grootel@gmail.com wrote:

...
... here are the trace messages

X: ul

FN: function($node_17 as node()*) as node()* { ((: node()*, true :)

xquery:eval("ul", { "":$node_17 })) }

X: li

FN: function($node_17 as node()*) as node()* { ((: node()*, true :)

xquery:eval("ul", { "":$node_17 })) }

--Marc

-- --Marc

Mansi

1:30 p.m.

New subject: Distributed processing on roadmap ?

Hello,

I love using BaseX and the powers of BaseX. Currently I am able to query ~60GB of XML files under 2.5 mins. I still have a few more optimization a to try. I also do see this data increasing to a couple of TB shortly.

I would love to see if this kind of processing is almost real time (within a min). So my question is there any discussions around supporting distributed processing or clusters of nodes etc ?

- Mansi

Christian Grün

17 Nov 17 Nov

10:50 a.m.

New subject: Distributed processing on roadmap ?

Hi Mansi,

it's nice to hear that you have been successfully scaling your database instances so far.

...

I love using BaseX and the powers of BaseX. Currently I am able to query ~60GB of XML files under 2.5 mins. I still have a few more optimization a to try. I also do see this data increasing to a couple of TB shortly.

I would love to see if this kind of processing is almost real time (within a min). So my question is there any discussions around supporting distributed processing or clusters of nodes etc ?

Yes, distributed processing is a frequently discussed topic. One of our major questions is what challenge to solve first. As you surely know, there are so many different NoSQL stores out there, and all of them tackle different problems. Up to now, we spent most time on replication, but this would not give you better performance.

So I would be interested to hear what kind of distribution techniques you believe would give you better performance. Do you think that a map/reduce approach would be helpful, or do you simply have lots of data that somehow needs to be sent to a client as quickly as possible? In other words, how large are your results sets? Do you really need the complete results, or would you rather like to draw some conclusions from the scanned data?

Back to the current technology… Maybe you could do some Java profiling (using e.g. -Xrunhprof:cpu=samples) in order to find out what's the current bottleneck.

Best, Christian

Mansi Sheth

20 Nov 20 Nov

6:03 p.m.

New subject: Distributed processing on roadmap ?

Sorry about the delay. I was busy preparing a presentation for my company as baseX being a our analytics solution. It was very well received. All thanks to you and everyone on this user list :)

Based on my use cases, I believe (again I am no expert in this domain), map/reduce approach would work better. The result set being returned would contain maximum couple of thousand records with some post-processing on it, as compared to TBs of data being queried. If the querying and processing step could use processing power from clusters of nodes, may be we might get significant performance gain ? What are your thoughts ? What are other use cases, you come across ?

- Mansi

On Mon, Nov 17, 2014 at 10:50 AM, Christian Grün christian.gruen@gmail.com wrote:

...

Hi Mansi,

it's nice to hear that you have been successfully scaling your database instances so far.

...
I love using BaseX and the powers of BaseX. Currently I am able to query

~60GB of XML files under 2.5 mins. I still have a few more optimization a to try. I also do see this data increasing to a couple of TB shortly.

...
I would love to see if this kind of processing is almost real time

(within a min). So my question is there any discussions around supporting distributed processing or clusters of nodes etc ?

Yes, distributed processing is a frequently discussed topic. One of our major questions is what challenge to solve first. As you surely know, there are so many different NoSQL stores out there, and all of them tackle different problems. Up to now, we spent most time on replication, but this would not give you better performance.

So I would be interested to hear what kind of distribution techniques you believe would give you better performance. Do you think that a map/reduce approach would be helpful, or do you simply have lots of data that somehow needs to be sent to a client as quickly as possible? In other words, how large are your results sets? Do you really need the complete results, or would you rather like to draw some conclusions from the scanned data?

Back to the current technology… Maybe you could do some Java profiling (using e.g. -Xrunhprof:cpu=samples) in order to find out what's the current bottleneck.

Best, Christian

-- - Mansi

Andy Bunce

21 Nov 21 Nov

5:54 a.m.

New subject: Distributed processing on roadmap ?

Hi Mansi,

The other day, I came across this work [1] [2] by Darin McBeath that may be of interest. It use Apache Spark [3] with Saxon. In principle it looks like one could build something similar using the BaseX jar in place of Saxon.

/Andy

[1] https://github.com/elsevierlabs/spark-xml-utils [2] http://mail-archives.apache.org/mod_mbox/spark-user/201408.mbox/%3C140793661... [3] http://spark.apache.org/

On 20 November 2014 23:03, Mansi Sheth mansi.sheth@gmail.com wrote:

...

Sorry about the delay. I was busy preparing a presentation for my company as baseX being a our analytics solution. It was very well received. All thanks to you and everyone on this user list :)

Based on my use cases, I believe (again I am no expert in this domain), map/reduce approach would work better. The result set being returned would contain maximum couple of thousand records with some post-processing on it, as compared to TBs of data being queried. If the querying and processing step could use processing power from clusters of nodes, may be we might get significant performance gain ? What are your thoughts ? What are other use cases, you come across ?

Mansi

On Mon, Nov 17, 2014 at 10:50 AM, Christian Grün < christian.gruen@gmail.com> wrote:

...
Hi Mansi,

it's nice to hear that you have been successfully scaling your database instances so far.

...
I love using BaseX and the powers of BaseX. Currently I am able to

query ~60GB of XML files under 2.5 mins. I still have a few more optimization a to try. I also do see this data increasing to a couple of TB shortly.

...
I would love to see if this kind of processing is almost real time

(within a min). So my question is there any discussions around supporting distributed processing or clusters of nodes etc ?

Yes, distributed processing is a frequently discussed topic. One of our major questions is what challenge to solve first. As you surely know, there are so many different NoSQL stores out there, and all of them tackle different problems. Up to now, we spent most time on replication, but this would not give you better performance.

So I would be interested to hear what kind of distribution techniques you believe would give you better performance. Do you think that a map/reduce approach would be helpful, or do you simply have lots of data that somehow needs to be sent to a client as quickly as possible? In other words, how large are your results sets? Do you really need the complete results, or would you rather like to draw some conclusions from the scanned data?

Back to the current technology… Maybe you could do some Java profiling (using e.g. -Xrunhprof:cpu=samples) in order to find out what's the current bottleneck.

Best, Christian

--

Mansi

Christian Grün

5:54 a.m.

New subject: Distributed processing on roadmap ?

Hi Mansi,

...

I was busy preparing a presentation for my company as baseX being a our analytics solution. It was very well received.

Nice to hear!

...

[…] map/reduce […] If the querying and processing step could use processing power from clusters of nodes, may be we might get significant performance gain ? What are your thoughts ? What are other use cases, you come across ?

To answer that question, I invite you to have a look into the excellent master thesis from Lukas Lewandowski [1].

Christian

[1] http://www.inf.uni-konstanz.de/gk/pubsys/publishedFiles/Lewandowski12.pdf

3892

Age (days ago)

3897

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

8 comments

5 participants

tags (0)

participants (5)

Andy Bunce
Christian Grün
Mansi
Mansi Sheth
Marc van Grootel