Hello,
A very happy new year to all of you !!!
I have some very basic questions with indexing.
1. Most of my xqueries are of below nature
'/Archives/descendant::apiCalls[contains(@name,"com.sun")]/@name', where
apiCalls could be 3-4 level under 'Archives'. Xqueries are accessed via REST
Based on this, I used attribute indexing, after each update to DB. Am I
correct ? Should I have been using fulltext indexing instead ? Why ?
2. I have 1000s of documents, spanning over 100 XML DB, with total space
around 400 GB currently. Each query is taking roughly 30 mins, to run.
Though expectable performance, but I know I can do better with indexing.
Currently, when I looked at one of the DBs,
> open bi_output_3
Database 'bi_output_3' was opened in 38.22 ms.
> info db
Database Properties
Name: bi_output_3
Size: 3938 MB
Nodes: 16193129
Documents: 35
Binaries: 0
Timestamp: 2016-01-03T13:40:40.000Z
Resource Properties
Timestamp: 2016-01-03T13:40:40.776Z
Encoding: UTF-8
CHOP: true
Indexes
Up-to-date: false
TEXTINDEX: false
ATTRINDEX: false
FTINDEX: false
LANGUAGE: English
STEMMING: false
CASESENS: false
DIACRITICS: false
STOPWORDS:
UPDINDEX: false
AUTOOPTIMIZE: false
MAXCATS: 100
MAXLEN: 96
When looked at its HDD footprint:
ubuntu@<abc>/BaseXDB/bi_output_3$ ls -l
total 4032992
-rw-rw-r-- 1 ubuntu ubuntu 2209449064 Jan 1 17:00 atv.basex
-rw-rw-r-- 1 ubuntu ubuntu 4 Jan 1 16:35 atvl.basex
-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 1 16:35 atvr.basex
-rw-rw-r-- 1 ubuntu ubuntu 6414 Jan 3 13:40 doc.basex
-rw-rw-r-- 1 ubuntu ubuntu 6 Jan 1 17:00 ftxx.basex
-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 1 17:00 ftxy.basex
-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 1 17:00 ftxz.basex
-rw-rw-r-- 1 ubuntu ubuntu 829 Jan 3 13:40 inf.basex
-rw-rw-r-- 1 ubuntu ubuntu 28 Jan 1 17:00 swl.basex
-rw-rw-r-- 1 ubuntu ubuntu 1916444672 Jan 3 13:40 tbl.basex
-rw-rw-r-- 1 ubuntu ubuntu 3796037 Jan 3 13:40 tbli.basex
-rw-rw-r-- 1 ubuntu ubuntu 45462 Jan 1 17:00 txt.basex
-rw-rw-r-- 1 ubuntu ubuntu 4 Jan 1 16:35 txtl.basex
-rw-rw-r-- 1 ubuntu ubuntu 0 Jan 1 16:35 txtr.basex
ubuntu@<abc>/BaseXDB/bi_output_3$ pwd
/veracode/msheth/BaseXDB/bi_output_3
ubuntu@<abc>/BaseXDB/bi_output_3$
My concern is, at each DB update, I am using attribute indexing, but info
command on basex prompt tells me otherwise. Am I misreading something ? Is
there a way to fix this once DB is created ? Its takes me 48 hours, to
create DBs from scratch... :)
Reading thru UPDINDEX and AUTOOPTIMIZE ALL commands, tells me to open each
DB and run these commands. Is that my option ? Do we have a xquery script
somewhere which I can use to do this ?
Thanks,
- Mansi