Re: [basex-talk] Distributed processing on roadmap ?

17 Nov 2014


      Hi Mansi,
it's nice to hear that you have been successfully scaling your
database instances so far.
...
I love using BaseX and the powers of BaseX. Currently I am able to query ~60GB of XML files under 2.5 mins. I still have a few more optimization a to try. I also do see this data increasing to a couple of TB shortly.
I would love to see if this kind of processing is almost real time (within a min). So my question is there any discussions around supporting distributed processing or clusters of nodes etc ?
Yes, distributed processing is a frequently discussed topic. One of
our major questions is what challenge to solve first. As you surely
know, there are so many different NoSQL stores out there, and all of
them tackle different problems. Up to now, we spent most time on
replication, but this would not give you better performance.
So I would be interested to hear what kind of distribution techniques
you believe would give you better performance. Do you think that a
map/reduce approach would be helpful, or do you simply have lots of
data that somehow needs to be sent to a client as quickly as possible?
In other words, how large are your results sets? Do you really need
the complete results, or would you rather like to draw some
conclusions from the scanned data?
Back to the current technology… Maybe you could do some Java profiling
(using e.g. -Xrunhprof:cpu=samples) in order to find out what's the
current bottleneck.
Best,
Christian

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] Distributed processing on roadmap ?