Hi Christian,
- Console: open tuscreen xquery //*:molecule[@*:id="11178"]
Wow, thank you. That gave a huge boost in performance:
for $i in //*:molecule[@*:id="298038"]/*:identifier[@*:convention="iupac:smile"] return <smile>{$i/@value}</smile>
Takes 80ms now.
But unfortunately it doesn't work for a query like:
let $ids := (1 to 10) for $i in //*:molecule[@*:id = $ids]/*:identifier[@*:convention="iupac:smile"] return <smile>{$i/@value}</smile>
Query executed in 3688.12 ms.
So it still takes >3500 ms
The Query execution plan is quite different I noticed:
First Query: Result: element { "smiles" } { for $i in IndexAccess("298038", ATV)/self::*:id/parent::*:molecule/*:identifier[@*:convention = "iupac:smile"] return element { "smile" } { $i/@value } }
Second Query: Result: for $i in IndexAccess("iupac:smile", ATV)/self::*:convention/parent::*:identifier[parent::*:molecule[@*:id = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)]] return element { "smile" } { $i/@value }
So the second plan doesn't make use of the attribute index for locating the id-attributes.
Sorry, zu spät gesehen… Jederzeit ;) Je mehr Mails allerdings an die Liste gehen, desto besser; dann bleiben auch die anderen auf dem Laufenden.
Dann antworte ich besser an die Liste ;)
Patrick
The Query execution plan is quite different I noticed:
First Query: Result: element { "smiles" } { for $i in IndexAccess("298038", ATV)/self::*:id/parent::*:molecule/*:identifier[@*:convention = "iupac:smile"] return element { "smile" } { $i/@value } }
Second Query: Result: for $i in IndexAccess("iupac:smile", ATV)/self::*:convention/parent::*:identifier[parent::*:molecule[@*:id = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)]] return element { "smile" } { $i/@value }
Query optimizer tries to find the cheapest index evaluation. As the costs for queries on variables cannot be estimated in advance, the optimizer will choose the index access for "iupac:simple". If you want to enforce index access on a certain expression, you could try to rewrite your query, so that the other expressions cannot be optimized.
Next, the optimizer prefers single string values (although range queries are generally supported as well). In your query, it might help to first convert your ids to strings, such as shown here:
let $ids := for $i in 1 to 10 return string($i) for $i in //*:molecule[@*:id = $ids]/*:identifier[@*:convention="iupac:smile"] return $i
Hope this helps, Christian
Next, the optimizer prefers single string values (although range queries are generally supported as well). In your query, it might help to first convert your ids to strings, such as shown here:
let $ids := for $i in 1 to 10 return string($i) for $i in //*:molecule[@*:id = $ids]/*:identifier[@*:convention="iupac:smile"] return $i
Thanks! That was the solution!
"Query executed in 45.11 ms."
Patrick
basex-talk@mailman.uni-konstanz.de