Hi Mansi,
- I am not 100% clear, if you are motivating me towards or against FULLTEXT
indexing :)
This is something you’ll have to answer by yourself; it depends on the kind of queries and on your ability to store attribute values as texts.
- Yes I am dealing with GBs of XML files. I create new Databases, using
JAVA API using CreateDB class. Should I be using MainOptions to set AUTOOPTIMIZE and UPDINDEX options before each new db creation ? In MainOptions class, I didn't find any auto optimize option, am I missing something ? Since, I am anyways setting options thru this method, should I also set FTINDEX or ATTRINDEX (based on your response 1) attribute as well, before creating each DB ?
As indicated, AUTOOPTIMIZE is no viable choice for data instances of that size. UPDINDEX may be a suitable, but before creating any index structures, I advise you to first do some testing with smaller instances. Only after that, you will know which index structures you need for speeding up your queries. I hope our Wiki articles on index structures and the full-text feature are helpful in that regard.
Christian
On Sun, Jan 3, 2016 at 4:52 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Mansi,
- Most of my xqueries are of below nature
'/Archives/descendant::apiCalls[contains(@name,"com.sun")]/@name', where apiCalls could be 3-4 level under 'Archives'. Xqueries are accessed via REST
The existing index structures won’t allow you to look for arbitrary sub strings; see [1] for more information.
You are right, the full-text index may be a possibly way out. Prefix searches can be realized via the "using wildcards" option [2]:
//*[text() contains text "abc.*" using wildcards
Please note that the query string will always be "tokenized": if you are looking for "com.sun", you will also get results like "COM SUN!".
- I have 1000s of documents, spanning over 100 XML DB, with total space
around 400 GB currently. Each query is taking roughly 30 mins, to run.
My concern is, at each DB update, I am using attribute indexing, but info command on basex prompt tells me otherwise. Am I misreading something ? Is there a way to fix this once DB is created ? Its takes me 48 hours, to create DBs from scratch... :)
If UPDINDEX and AUTOOPTIMIZE is false, you will need to call "OPTIMIZE" after your updates.
If you create a new database, you can set UPDINDEX and AUTOOPTIMIZE to true. However, AUTOOPTIMIZE will get incredibly slow if you are working with gigabytes of XML data.
Reading thru UPDINDEX and AUTOOPTIMIZE ALL commands, tells me to open each DB and run these commands. Is that my option ? Do we have a xquery script somewhere which I can use to do this ?
If your databases are called "db1" ... "db100", the following XQuery script will optimize all those databases:
for $i in 1 to 100 return db:optimize('db' || $i)
You can also create a command script [3] with XQuery:
<commands>{ for $i in 1 to 100 return ( <open>{ 'db' || $i }</open>, <optimize/> ) }</commands>
You can store the result as a .bxs file and run it afterwards.
Before you create all index structures, you should probably run your queries on some smaller database instances and check out the "Query Info" panel in the GUI. It will tell you if an index is used or not.
Best, Christian
[1] http://docs.basex.org/wiki/Indexes#Value_Indexes [2] http://docs.basex.org/wiki/Full-Text#Match_Options [3] http://docs.basex.org/wiki/Commands#Command_Scripts
--
- Mansi