? Hi Basex Team,
We are starting a new project in that we have selected to use basex,i would tell you my requirement,request you to please suggest the best way to use basex.
We have started a web application in that we have a requirement to create xml db's per user based means each user who logs in we need to create DB's on his xml files in the backend,for this we dont want to use the client-server architecture of the basex as we dont want that extra time to hit a http server request to basex server and get the response,what we would want is to create the DB's execute queries and drop the DB's with in the same web application and also please note that we dont have a requirement to edit the DB's data we would only read the data from those DB's by executing the queries and also note that volumes of the xml files are very huge they might go up to gb's also and query output will also be in huge volume and also we must make sure that this huge volumes of query output will not eatup all the heap memory as the number of parallel users using this site are very high and we cannot afford to go out of memory.Also we would need to configure the DB paths as this are huge volumes we cannot store them in the default locations to.
Considering all the above points request you to please suggest the best way to use basex. Our web application is developed in java.Though we have used basex in our earlier projects some where i feel that we did not used it properly and gone through problems and the kind of support we got from your team in solving them earlier was immense and this time we want to go in a more systematic way so request you to please help us by suggesting the best way.
Thanks & Regards SAteesh.A
The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited.
Dear Sateesh,
We have started a web application in that we have a requirement to create xml db's per user based means each user who logs in we need to create DB's on his xml files in the backend.
This should be no problem. New databases can easily be created and dropped via the existing APIs or even XQuery.
,for this we dont want to use the client-server architecture of the basex as we dont want that extra time to hit a http server request to basex server
Have you encountered the client/server architecture to be a serious bottleneck, or is this a theoretical assumption? If you want to distribute your data anyway, it could make sense to do some manual sharding and experiment with multiple BaseX instances on different servers. If you believe that this makes no sense in your scenario, you can surely use BaseX in an embedded way.
also note that volumes of the xml files are very huge they might go up to gb's also
Please note that the creation of a database with gigabytes of data might take some seconds or even minutes. Do you want to created these database every time a user logs in, or only once (the first time)?
and query output will also be in huge volume and also we must make sure that this huge volumes of query output will not eatup all the heap memory
If you use the APIs in the right way, all data will be streamed, so you can expect constant memory consumption (as long as you do not buffer the received results in your client).
as the number of parallel users using this site are very high
What does "very high" mean? Do you expect 1000 parallel requests per day, hour, minute, second (ms, ns, ...)? Will you have thousands or millions of users?
and we cannot afford to go out of memory.Also we would need to configure the DB paths as this are huge volumes we cannot store them in the default locations to.
One approach to distribute database directories is to create symlinks to other drives. If a new database is created, it will always be added to the default folder, but you could create empty databases in advance, which are then populated as soon as new users are registered.
Hope this helps, Christian
Hi Christian,
Thank you so much for your reponse christian,Please find my responses in the below mail for your queries,Please respond on those responses.
Thanks & Regards Sateesh.A ________________________________________ From: Christian Grün christian.gruen@gmail.com Sent: Wednesday, October 8, 2014 7:15 PM To: Sateesh Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Fw: Suggestion for a requirement
Dear Sateesh,
We have started a web application in that we have a requirement to create xml db's per user based means each user who logs in we need to create DB's on his xml files in the backend.
This should be no problem. New databases can easily be created and dropped via the existing APIs or even XQuery.
,for this we dont want to use the client-server architecture of the basex as we dont want that extra time to hit a http server request to basex server
Have you encountered the client/server architecture to be a serious bottleneck, or is this a theoretical assumption? If you want to distribute your data anyway, it could make sense to do some manual sharding and experiment with multiple BaseX instances on different servers. If you believe that this makes no sense in your scenario, you can surely use BaseX in an embedded way.
also note that volumes of the xml files are very huge they might go up to gb's also
Please note that the creation of a database with gigabytes of data might take some seconds or even minutes. Do you want to created these database every time a user logs in, or only once (the first time)?
#Sateesh: This would be only one time activity only,but one problem with this approach is lets suppose i have 1gb of xml file when i create db for this xml it would result in approximatly 2gb of database and for this i need to maintain TB's of hard disk which we are not able to defend with clients.Is there a way where the db size does not go that big without loosing performance.
and query output will also be in huge volume and also we must make sure that this huge volumes of query output will not eatup all the heap memory
If you use the APIs in the right way, all data will be streamed, so you can expect constant memory consumption (as long as you do not buffer the received results in your client).
as the number of parallel users using this site are very high
What does "very high" mean? Do you expect 1000 parallel requests per day, hour, minute, second (ms, ns, ...)? Will you have thousands or millions of users?
#Sateesh: Per minute we can expect 100 parallel hits,in this hits some queries might be complex queries and some might be returning huge volumes of data in response.
and we cannot afford to go out of memory.Also we would need to configure the DB paths as this are huge volumes we cannot store them in the default locations to.
One approach to distribute database directories is to create symlinks to other drives. If a new database is created, it will always be added to the default folder, but you could create empty databases in advance, which are then populated as soon as new users are registered.
Hope this helps, Christian The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited.
Hi sorry to ask a basic questions, but I'm a newbie that doesn't always understand all the pre-requisite in the documentation. If I can't find help here, just tell me.
I tried to install a local instance of baseX and use the web application features. I work on mac and installed baseX with homebrew after changing the password of the admin user, and creating a database, I launched the httpserver and typed in my browser http://localhost:8984/ the result is :
HTTP ERROR: 503
Problem accessing /webapp. Reason:
Service Unavailable
what am I missing ? is it even possible to use a web app on a local computer ? thanks in advance for your help...
EM
Hi Emmanuelle,
unfortunately I cannot tell you much about the Homebrew distribution of BaseX, but you'll probably get a response from our Mac users soon. Until then, maybe you can give the ZIP version a try [1] and report back to us if you are more successful?
Thanks, Christian
[1] http://basex.org/products/download/all-downloads/
On Thu, Oct 9, 2014 at 9:51 AM, Emmanuelle Morlock emmanuelle.morlock@mom.fr wrote:
Hi sorry to ask a basic questions, but I'm a newbie that doesn't always understand all the pre-requisite in the documentation. If I can't find help here, just tell me.
I tried to install a local instance of baseX and use the web application features. I work on mac and installed baseX with homebrew after changing the password of the admin user, and creating a database, I launched the httpserver and typed in my browser http://localhost:8984/ the result is :
HTTP ERROR: 503
Problem accessing /webapp. Reason:
Service Unavailable
what am I missing ? is it even possible to use a web app on a local computer ? thanks in advance for your help...
EM
EM,
Are you still facing that. I too installed using homebrew. started service using: basexhttp, and made an http request:
http://localhost:8984/rest/<db_name>?query=<query>
and everything worked fine.
Is your database on local machine from where you started basexhttp ? I initially, had my database on external HDD, and service wasn't starting.
Hope this helps, - Mansi
On Thu, Oct 9, 2014 at 3:51 AM, Emmanuelle Morlock < emmanuelle.morlock@mom.fr> wrote:
Hi sorry to ask a basic questions, but I'm a newbie that doesn't always understand all the pre-requisite in the documentation. If I can't find help here, just tell me.
I tried to install a local instance of baseX and use the web application features. I work on mac and installed baseX with homebrew after changing the password of the admin user, and creating a database, I launched the httpserver and typed in my browser http://localhost:8984/ the result is : HTTP ERROR: 503
Problem accessing /webapp. Reason:
Service Unavailable
what am I missing ? is it even possible to use a web app on a local computer ? thanks in advance for your help...
EM
#Sateesh: This would be only one time activity only,but one problem with this approach is lets suppose i have 1gb of xml file when i create db for this xml it would result in approximatly 2gb of database and for this i need to maintain TB's of hard disk […]
This sounds like a general architectural challenge to me, which may not necessarily linked to BaseX itself. What do you need all those user-specific databases for?
which we are not able to defend with clients.
What do you mean by that?
#Sateesh: Per minute we can expect 100 parallel hits,in this hits some queries might be complex queries and some might be returning huge volumes of data in response.
Once again, this sounds more like a general issue. If the queries are really complex in terms of runtime (O(n²) or more?), and if the returned data is huge, you will always come across limits in terms of CPU power and bandwidth, no matter which system you are using. Maybe you could analyze if you really need to return the complete amount of data, or if the client can live with a much smaller result. Think of a world map: the contained amount of information is immense, but on a certain level, you will only see parts of the data. The same applies to the TreeMap in BaseX, which only retrieves parts of the database that are required for the actual visual representation. As you may know, XQuery is a very powerful language, which does not only allow you to do aggregation, ordering or filtering in a query. You can also use it to create completely new XML fragments that are comprised of the contents of existing databases.
HI Christian,
Could not reply to your mail.Here's the reply please go through and suggest us the way forward.
#Sateesh: This would be only one time activity only,but one problem with this approach is lets suppose i have 1gb of xml file when i create db for this xml it would result in approximatly 2gb of database and for this i need to maintain TB's of hard disk […]
This sounds like a general architectural challenge to me, which may not necessarily linked to BaseX itself. What do you need all those user-specific databases for?
#Sateesh1: We create databases and we use them to retrieve user information,as each time creating databases and deleting them takes some time what we are doing is we run an offline tool and creates those databases so that when user chks his details online the users will get the response more faster.With this approach what is happening is as we have around 10k users and each user will have a xml getting generated every month and also basex DB's getting created for those xml's is requiring TB's of storage,for this is there any option in basex which will create DB sizes equal to xml size or less than xml size's(Currently if xml size is 10mb DB size is creating as 20mb approx).
Thanks & Regards SAteesh.A ________________________________________ From: Christian Grün christian.gruen@gmail.com Sent: Thursday, October 9, 2014 2:32 PM To: Sateesh Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Fw: Suggestion for a requirement
#Sateesh: This would be only one time activity only,but one problem with this approach is lets suppose i have 1gb of xml file when i create db for this xml it would result in approximatly 2gb of database and for this i need to maintain TB's of hard disk […]
This sounds like a general architectural challenge to me, which may not necessarily linked to BaseX itself. What do you need all those user-specific databases for?
which we are not able to defend with clients.
What do you mean by that?
#Sateesh: Per minute we can expect 100 parallel hits,in this hits some queries might be complex queries and some might be returning huge volumes of data in response.
Once again, this sounds more like a general issue. If the queries are really complex in terms of runtime (O(n²) or more?), and if the returned data is huge, you will always come across limits in terms of CPU power and bandwidth, no matter which system you are using. Maybe you could analyze if you really need to return the complete amount of data, or if the client can live with a much smaller result. Think of a world map: the contained amount of information is immense, but on a certain level, you will only see parts of the data. The same applies to the TreeMap in BaseX, which only retrieves parts of the database that are required for the actual visual representation. As you may know, XQuery is a very powerful language, which does not only allow you to do aggregation, ordering or filtering in a query. You can also use it to create completely new XML fragments that are comprised of the contents of existing databases. The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited.
Hi Sateesh,
I couldn't follow all of your remarks in your last response, but if I get it right, your main concern seems to be that the total amount of tera bytes required to store databases for all users is too large to be stored on a single machine.
Databases in BaseX are already pretty small, compared to other native XML stores, and of course we try to keep their size as small as possible, so there is no way to shrink them to half of their way. However, if your queries do not take advantage of the text index structures (text and attribute index), you can disable them before creating a database, or remove them after the creation, and you will save some additional bytes.
Of course, you can always think of removing databases from users that are not using the system anymore.
Best, Christian
#Sateesh1: We create databases and we use them to retrieve user information,as each time creating databases and deleting them takes some time what we are doing is we run an offline tool and creates those databases so that when user chks his details online the users will get the response more faster.With this approach what is happening is as we have around 10k users and each user will have a xml getting generated every month and also basex DB's getting created for those xml's is requiring TB's of storage,for this is there any option in basex which will create DB sizes equal to xml size or less than xml size's(Currently if xml size is 10mb DB size is creating as 20mb approx).
Thanks & Regards SAteesh.A
basex-talk@mailman.uni-konstanz.de