I tried to create some code that we can use for joint testing. With the following snippet, you can create a database (sized around 400 MB) with 1 million value attributes and around 300.0000 distinct values:
db:create(
'test',
<xml>{
for $i in 1 to 1000000
let $value := codepoints-to-string(
random:seeded-integer($i mod 300000, 256, 26) ! (. + 97)
)
return <item value='{ $value }'/>
}</xml>,
'test.xml',
map { 'maxlen': 256, 'tokenindex': true() }
)
The following query chooses a random entry and returns all elements that contain the attribute with this value:
let $count := count(index:tokens('test'))
let $pos := (abs(random:integer()) mod $count) + 1
return db:token('test', index:tokens('test')[$pos])
The second query takes around 3 ms in my tests. Do you get similar performance?