...to get an even better impression: can you estimate how many distinct names you'll get in total?
I can do better than estimate :) After actually measuring, the data set we most often work with currently contains 258 content files with 3,524 unique element names. Obviously, this is an order of magnitude less than 2^15-1, so I must have been remembering the good old days when the limit really was 256 and we blew right past that. Practically, it also means that with the new unique element name limit I should be fine re-engineering to a single-database approach (yay!).
In general I agree with the direction of focusing on robust support for a single database. Ours was (and obviously no longer is) an edge case. As long as multiple documents can be stored in a single database and the limiting factors of that database exceed reasonable thresholds, I really don't see many use cases like ours in the future where multiple simultaneous database accesses would be required.
Thanks also for the insight into Context Data references - I've never really had that straight in my mind, but it makes sense now given the possibility to operate BaseX as a server with the potential for lots of different clients with lots of simultaneous database connections to them. That's totally outside our own use case though, so I'll just continue to ignore the multiple Data reference collection for now.
As always, thanks for the stellar feedback and support. I'm excited to integrate all the new stuff you've been adding in the last several months.
Dave
-----Original Message----- From: Christian GrĂ¼n [mailto:christian.gruen@gmail.com] Sent: Tuesday, July 05, 2011 12:57 PM To: Dave Glick Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Database Limits
Our current usages consume between 200 and 300 input files, so we'll still hit the 2^15 unique name limit.
Thanks; to get an even better impression: can you estimate how many distinct names you'll get in total? I remember one other use case in which we had to handle around 25,000 names (in most other cases, there are <100 names, which is why we still have that limit).
The multiple databases approach works, so I'll continue using it. It's just challenging to maintain. The main problem is that the internal BaseX API is really designed around having one database open at a time - or at least it was.
Yes, it still is.. and it might change at some time in future, but not probably not this year. Instead, we'd rather continue to extend the current limits to support TB-scale database instances (including more generous limits for element names).
In my recollection, while you can obviously query multiple databases using collections and other XQuery functions you've built in, the Context is designed to have one of them appear as "more important" than the others. However, my understanding was never all that good and/or this may have changed since I last looked at it in-depth.
It's pretty good indeed; the main database context was introduced to simplify database operations on command level, whereas XQuery allows you to access an arbitrary number of databases within the scope of a single query. With BaseX 6.7.1. or 6.8, we'll introduce additional custom XQuery db:...()-functions, which can then be used to perform batch operations on several databases.
Can you help me understand the current relationship between Context.datas, Context.data, and querying.
Context.datas is important in the client/server context: each client is allowed to open its own database, and the Context.datas object remembers how often each database has been referenced (pinned). In the standalone/embedded context, each database will be pinned at most once.
Hope this helps, Christian