Re: [basex-talk] Join operation and the database - BaseX-Talk - mailman.uni-konstanz.de

27 Jul 2017


      Hi Giuseppe,
...
If I compare values just using one indexed string (the one in @v), this is the fastest way (about one second on my machine).
It depends on the value you are looking for. If it only occurs once in
your database, the lookup will be very fast (in database terms, this
is called a “high selectivity”).
...
If I compare against two distinct indexed values, their order matters, in that -if I understand correctly- the database uses the index only(?) for the first values.
Exactly. You can enforce index access by directly using db:text().
...
I see that [p = $t/@o and f = $t] is much slower than [f = $t and p = $t/@o]. I calculated that on average f contains about  8 characters while p always contains 9. However, (Ancient Greek) characters  in f are heavier ( 2 or 3 bytes each) than the (Latin) ones in p (1 byte each). Can this be the reason why [f = $t and p = $t/@o] is evaluated faster?
This doesn’t matter (as long as the string length does not exceed
MAXLEN [1]). The critical question is how many index results you will
get for a single lookup. See the following example:
declare variable $txts := doc("tlg0001.tlg001.perseus-grc2.xml");
  for $t in ($txts//t)[position() = 1 to 10]
  return (
    "* " || $t/@o || ": " || count(db:text("splitted-db", $t/@o)),
    "* " || $t || ": " || count(db:text("splitted-db", $t))
  )
The first lookup will return much more hits than the second one.
You can call the following function to get a complete list of all index entries:
index:texts('splitted-db')
Cheers,
Christian
[1] http://docs.basex.org/wiki/Options#MAXLEN