Hi Giuseppe,
If I compare values just using one indexed string (the one in @v), this is the fastest way (about one second on my machine).
It depends on the value you are looking for. If it only occurs once in your database, the lookup will be very fast (in database terms, this is called a “high selectivity”).
If I compare against two distinct indexed values, their order matters, in that -if I understand correctly- the database uses the index only(?) for the first values.
Exactly. You can enforce index access by directly using db:text().
I see that [p = $t/@o and f = $t] is much slower than [f = $t and p = $t/@o]. I calculated that on average f contains about 8 characters while p always contains 9. However, (Ancient Greek) characters in f are heavier ( 2 or 3 bytes each) than the (Latin) ones in p (1 byte each). Can this be the reason why [f = $t and p = $t/@o] is evaluated faster?
This doesn’t matter (as long as the string length does not exceed MAXLEN [1]). The critical question is how many index results you will get for a single lookup. See the following example:
declare variable $txts := doc("tlg0001.tlg001.perseus-grc2.xml"); for $t in ($txts//t)[position() = 1 to 10] return ( "* " || $t/@o || ": " || count(db:text("splitted-db", $t/@o)), "* " || $t || ": " || count(db:text("splitted-db", $t)) )
The first lookup will return much more hits than the second one.
You can call the following function to get a complete list of all index entries:
index:texts('splitted-db')
Cheers, Christian
basex-talk@mailman.uni-konstanz.de