I am trying to distribute data across multiple databases. I can't distribute based on day, as there could very well be situation, where single day's data could more than capacity of BaseX DB. From statistics page, only other way, which I can distribute is based on "number of nodes". But going with that, I am not able to find a way, I can get hold of a way to access "no of nodes" programmatically in a db. Further, I am clueless, if I can even find no of nodes of current doc to be imported. 

So, 

currentDocToImport = a.xml
??NodeNo(a.xml)

NumberOfNodes("LastDB") = ??

Do you guys agree if this is even a way to go ? Can someone give me pointers on how to find above 2 values ? Any other thoughts are always welcomed ...

- Mansi

On Tue, Oct 7, 2014 at 5:35 AM, Christian Grün <christian.gruen@gmail.com> wrote:
Dear Mansi,

> 1. I have 1000s of XML files (each between 50MB-400MB) and this is going to
> grow exponentially (~200 / per day). So, my question is how scalable is
> BaseX ? Can I configure it to use data from my external HDD, in my initial
> prototype ?

So this means you want to add appr. 40 gb of XML files per day, right,
amounting to 14 tb/year? This sounds quite a lot indeed. You can have
a look at our statistics page [1]; it gives you some insight into the
current limits of BaseX.

However, all limits are per single database. You can distribute your
data in multiple databases and address multiple databases with a
single XPath/XQuery request. For example, you could create a new
database every day and run a query over all these databases:

  for $db in db:list()
  return db:open($db)/path/to/your/data

> 2. I plan to heavily use XPATH, for data retrieval. Does BaseX, use any
> multi-processing, multi-threading to speed up search ? Any concurrent
> processing ?

Read-only requests will automatically be multithreaded. If a single
query leads to heavy I/O requests, it may be that single threaded
processing wlil give you better results (because hard drives are often
not very good in reading data in parallel).

> 3. Can I do some post-processing on searched and retrieved data ? Like
> sorting, unique elements etc ?

With XQuery (3.0), you can do virtually anything with your data. In
most of our data-driven scenarios, all data processing is completely
done in BaseX. Some plain examples can be found in our Wiki [2].

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/Statistics
[2] http://docs.basex.org/wiki/XQuery_3.0



--
- Mansi