Hi Jon,

Currently I work on querying distributed XML collections. Therefore I tested also the local BaseX instance (without distribution) for some query performance. My used collection consists of about 1,600,000 XML news articles, each 10 KB size. 
One of my queries searches for news articles which contain a special keyword (using the full text index). It takes about 100ms for the evaluation.

Happy new year.
Lukas


On 01.01.2012, at 22:53, Jon Morehouse wrote:

Hello Lukas,
Thanks for your response. I am simply looking at storing text in my xml articles. I would be recreating articles with xml and all pictures would be hyperlinked on an external server. I would say each file would be no larger than say 20 kB max. 
I am also looking at simply recreating a localized search engine. In terms of scalability, in your opinion, do you think that Basex would be more sufficient for scaling and searching through content. I am hoping my local database grows to hold hundreds of thousands, maybe millions of articles and would like to access any single article by searching with keywords for my users.

Anyways, I appreciate your response. Have a good new year.




Jon Morehouse
Moeller High School Class of 2009
Pepperdine University 2009-2010
University of Southern California Class of 2013


<Signature Final (2).jpg>

On Jan 1, 2012, at 8:59 AM, Lukas Lewandowski wrote:

Hi Jon Morehouse,

You can use BaseX to create a database (a collection) of all your XML files. You can find some information here [1].
Additionally you can create a full-text index to support fast access of keywords within text nodes, see also [2].

Afterwards you can access/query your database e.g., using the REST API [3] or a client API [4,5].

Which size does one XML file have?

regards
Lukas



On Jan 1, 2012, at 9:12 AM, Jon Morehouse wrote:

     I am new to BaseX and am excited to be moving forward with this product. Right now I am setting up a website where I want to be able to query across millions of xml files. For instance, if each file contains different keywords, I would like to query across each file to match them with a list of say, my top 50 keywords, to find which files have the most keywords present, the most amount of times. Would something like this be possible with basex? It seems like it would be a simple xquery piece using php (the list of keywords is coming from mysql) but with each xml being its own xml file, would it be possible to search across each and every database/file.





Jon Morehouse
Moeller High School Class of 2009
Pepperdine University 2009-2010
University of Southern California Class of 2013


<Signature Final (2).jpg>

_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk