I tried to create some code that we can use for joint testing. With the following snippet, you can create a database (sized around 400 MB) with 1 million value attributes and around 300.0000 distinct values:
db:create( 'test', <xml>{ for $i in 1 to 1000000 let $value := codepoints-to-string( random:seeded-integer($i mod 300000, 256, 26) ! (. + 97) ) return <item value='{ $value }'/> }</xml>, 'test.xml', map { 'maxlen': 256, 'tokenindex': true() } )
The following query chooses a random entry and returns all elements that contain the attribute with this value:
let $count := count(index:tokens('test')) let $pos := (abs(random:integer()) mod $count) + 1 return db:token('test', index:tokens('test')[$pos])
The second query takes around 3 ms in my tests. Do you get similar performance?