Hello,
A very happy new year to all of you !!!
I have some very basic questions with indexing.
1. Most of my xqueries are of below nature
'/Archives/descendant::apiCalls[contains(@name,"com.sun")]/@name', where apiCalls could be 3-4 level under 'Archives'. Xqueries are accessed via REST
Based on this, I used attribute indexing, after each update to DB. Am I correct ? Should I have been using fulltext indexing instead ? Why ?
2. I have 1000s of documents, spanning over 100 XML DB, with total space around 400 GB currently. Each query is taking roughly 30 mins, to run. Though expectable performance, but I know I can do better with indexing. Currently, when I looked at one of the DBs,
open bi_output_3
Database 'bi_output_3' was opened in 38.22 ms.
info db
Database Properties Name: bi_output_3 Size: 3938 MB Nodes: 16193129 Documents: 35 Binaries: 0 Timestamp: 2016-01-03T13:40:40.000Z
Resource Properties Timestamp: 2016-01-03T13:40:40.776Z Encoding: UTF-8 CHOP: true
Indexes Up-to-date: false TEXTINDEX: false ATTRINDEX: false FTINDEX: false LANGUAGE: English STEMMING: false CASESENS: false DIACRITICS: false STOPWORDS: UPDINDEX: false AUTOOPTIMIZE: false MAXCATS: 100 MAXLEN: 96
When looked at its HDD footprint:
ubuntu@<abc>/BaseXDB/bi_output_3$ ls -l total 4032992 -rw-rw-r-- 1 ubuntu ubuntu 2209449064 Jan 1 17:00 atv.basex -rw-rw-r-- 1 ubuntu ubuntu 4 Jan 1 16:35 atvl.basex -rw-rw-r-- 1 ubuntu ubuntu 0 Jan 1 16:35 atvr.basex -rw-rw-r-- 1 ubuntu ubuntu 6414 Jan 3 13:40 doc.basex -rw-rw-r-- 1 ubuntu ubuntu 6 Jan 1 17:00 ftxx.basex -rw-rw-r-- 1 ubuntu ubuntu 0 Jan 1 17:00 ftxy.basex -rw-rw-r-- 1 ubuntu ubuntu 0 Jan 1 17:00 ftxz.basex -rw-rw-r-- 1 ubuntu ubuntu 829 Jan 3 13:40 inf.basex -rw-rw-r-- 1 ubuntu ubuntu 28 Jan 1 17:00 swl.basex -rw-rw-r-- 1 ubuntu ubuntu 1916444672 Jan 3 13:40 tbl.basex -rw-rw-r-- 1 ubuntu ubuntu 3796037 Jan 3 13:40 tbli.basex -rw-rw-r-- 1 ubuntu ubuntu 45462 Jan 1 17:00 txt.basex -rw-rw-r-- 1 ubuntu ubuntu 4 Jan 1 16:35 txtl.basex -rw-rw-r-- 1 ubuntu ubuntu 0 Jan 1 16:35 txtr.basex ubuntu@<abc>/BaseXDB/bi_output_3$ pwd /veracode/msheth/BaseXDB/bi_output_3 ubuntu@<abc>/BaseXDB/bi_output_3$
My concern is, at each DB update, I am using attribute indexing, but info command on basex prompt tells me otherwise. Am I misreading something ? Is there a way to fix this once DB is created ? Its takes me 48 hours, to create DBs from scratch... :)
Reading thru UPDINDEX and AUTOOPTIMIZE ALL commands, tells me to open each DB and run these commands. Is that my option ? Do we have a xquery script somewhere which I can use to do this ?
Thanks, - Mansi