Thanks Christian.

re: size of data, I am hoping some days would be quieter than discussed below. But, yes its going to be a lot of data.

I just created a single Database with ~190 XML files of size 8.5 GB total. Activated indexes as well. Creating database using basexgui took close to an hour. Running a simple XQUERY took ~3 min. Database was created on an external USB 3.0 HDD. I will obviously be creating new databases across drives (if this POC is successful, will surely go for cloud) to scale it.

For time being, any and all tips are welcomes to optimize performance.

May be I will soon contribute to the statistics pages :)

- Mansi

On Tue, Oct 7, 2014 at 5:35 AM, Christian Grün <christian.gruen@gmail.com> wrote:

Dear Mansi,

> 1. I have 1000s of XML files (each between 50MB-400MB) and this is going to
> grow exponentially (~200 / per day). So, my question is how scalable is
> BaseX ? Can I configure it to use data from my external HDD, in my initial
> prototype ?

So this means you want to add appr. 40 gb of XML files per day, right,
amounting to 14 tb/year? This sounds quite a lot indeed. You can have
a look at our statistics page [1]; it gives you some insight into the
current limits of BaseX.

However, all limits are per single database. You can distribute your
data in multiple databases and address multiple databases with a
single XPath/XQuery request. For example, you could create a new
database every day and run a query over all these databases:

for $db in db:list()
return db:open($db)/path/to/your/data

> 2. I plan to heavily use XPATH, for data retrieval. Does BaseX, use any
> multi-processing, multi-threading to speed up search ? Any concurrent
> processing ?

Read-only requests will automatically be multithreaded. If a single
query leads to heavy I/O requests, it may be that single threaded
processing wlil give you better results (because hard drives are often
not very good in reading data in parallel).

> 3. Can I do some post-processing on searched and retrieved data ? Like
> sorting, unique elements etc ?

With XQuery (3.0), you can do virtually anything with your data. In
most of our data-driven scenarios, all data processing is completely
done in BaseX. Some plain examples can be found in our Wiki [2].

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/Statistics
[2] http://docs.basex.org/wiki/XQuery_3.0

--
- Mansi