Hi Dirk,
Thanks for responding to challenges. Just to clarify when you say upper file size limit, are you referring to the individual files? I only ask because I saw a DB limit of "Unlimited", so I was uncertain of the distinction, but thought it probably meant there is not a hard limit on the overall DB size. In my case, the individual files themselves are fairly small, but my total DB size will grow up to about 24 TB...do you see any issues with this in terms of capacity and being able to query fairly quickly across the whole subset - assuming of course my Xquery is tuned? If the 512 GB is the DB size limit, I would be curios to learn about what dictates that limit, and how how I could help
In terms of scaling it sounds like you are saying I can just go to a shared file system and have several Base X instances pointing to that file system. Therefore, as requests came in, I would direct them to specific instances. Would this not be a problem for write updates? I.E. Is there a write locking that will prevent two threads trying to update a document with the same GUID (I am assuming there is a universal ID for each document) simultaneously...perhaps that is part of your current project?
Give me a couple of days, I will write you a detailed brief on my real world use case. Thanks for all your advice and help!
Thanks Raj
________________________________ From: Dirk Kirsten dk@basex.org To: Rajabrata Chaudhuri rajabrata@yahoo.com Cc: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de Sent: Thursday, March 28, 2013 2:27 AM Subject: Re: [basex-talk] BaseX Capacity
Hello Raj,
thanks for your interest in BaseX.
You can see the current upper limits of Basex at [1]. As you can see, the current upper file size limit is 512GiB per database. However, you can always distribute your data across several databases as databases in BaseX are a fairly lightweight concept and you can also access multiple databases within one XQuery expression. So, theoretically you can save Terabytes of data.
However, if query execution against such a large database will be efficient is very difficult to tell. It heavily depends on the type of query you want to run, but personally I would not expect a blasting performance. But again, this is very hard to tell.
Scaling out and replication is currently not supported by BaseX. Of course you can always use some kind of distributed file system to physically distribute your data, but BaseX itself is not doing this for you. Of course, you could start several BaseX servers and store certain data at specific servers, but there will be no synchronization of any kind. However, we would love to change this and this is actually my current project.
I gave a short talk about our plans at our user meet-up at XML Prague. You can see the slides at [2] (hopefully the videos will be there as well any time soon). So, we are interested in scaling out and replication. Therefore, I am also very interested in real-world use cases. I would be very interested if you could tell me more about your specific requirements (either by private mail or mailing list), so that we in the end will have a real-world usable solution.
Cheers, Dirk
[1] http://docs.basex.org/wiki/Statistics [2] http://files.basex.org/xmlprague2013/
On Tue, Mar 26, 2013 at 9:22 PM, Rajabrata Chaudhuri rajabrata@yahoo.com wrote:
Hello,
First I'd like to thank you guys for all your great work on BaseX. I am fairly familiar with XML DBs and have done a significant amount of development on top of Mark Logic. I would like to ask some questions about capacity and scalability. I have reviewed the documentation and see that the biggest store is for SDMX @ approximately 8000 GB. So I am just trying to understand what this means better and would appreciate any of your expert advice for my questions below:
1. Is the expectation that you can query against 8 TB of XML data efficiently? 2. My requirements will be to query across probably 24 TB of XML data. Do you guys feel this is possible? 3. What is the method to scale horizontally and vertically? I.E. Would I be adding more servers, or
starting more instances, etc.?
4. How does high availability work? I.E. Can I have multiple active-active nodes, or should it be active-passive, etc.?
Any help anyone can render is greatly appreciated.
Thanks Raj
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk