Hi Christian,
Am 13.11.2019 um 18:38 schrieb Christian Grün:
Hi Omar,
I am not 100% sure what redundant expressions you saw in my code. Is this about using reverse() instead of having two for loops?
In your initial query, the path…
collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d
…was evaluated four times. If you bind it to a variable, it will only be evaluated once. In addition, using child steps instead // is faster, too (in many cases, BaseX will rewrite your path for you).
I always try to make the query optimizer's job as easy as possible and that makes things fast most of the time. I think the statements were optimized as db:attribute(..., 'none') so // actually was never used. My current approach looks like this as optimized query:
let $ds_0 := db:attribute("_qdb-TEI-02__cache", "none")/self::order/parent::element()/_:d let $sorted-ascending_1 := for $d_2 in $ds_0 order by data($d_2/@vutlsk) empty least return $d_2 let $sorted-ascending-archiv_3 := for $d_4 in $ds_0 order by data($d_4/@vutlsk-archiv) empty least return $d_4 return (db:replace("_qdb-TEI-02__cache", "ascending_cache.xml", element Q{https://www.oeaw.ac.at/acdh/tools/vle/util%7Ddryed { (attribute order { ("ascending") }, attribute ids { (string-join(subsequence($sorted-ascending_1, 1, 15000)/((@ID, @xml:id)), " ")) }) }), db:replace("_qdb-TEI-02__cache", "descending_cache.xml", element Q{https://www.oeaw.ac.at/acdh/tools/vle/util%7Ddryed { (attribute order { ("descending") }, attribute ids { (string-join(subsequence(reverse($sorted-ascending_1), 1, 15000)/((@ID, @xml:id)), " ")) }) }), db:replace("_qdb-TEI-02__cache", "ascending-archiv_cache.xml", element Q{https://www.oeaw.ac.at/acdh/tools/vle/util%7Ddryed { (attribute order { ("ascending") }, attribute label { ("archiv") }, attribute ids { (string-join(subsequence($sorted-ascending-archiv_3, 1, 15000)/((@ID, @xml:id)), " ")) }) }), db:replace("_qdb-TEI-02__cache", "descending-archiv_cache.xml", element Q{https://www.oeaw.ac.at/acdh/tools/vle/util%7Ddryed { (attribute order { ("descending") }, attribute label { ("archiv") }, attribute ids { (string-join(subsequence(reverse($sorted-ascending-archiv_3), 1, 15000)/((@ID, @xml:id)), " ")) }) }))
It is interesting to hear that BaseX does not profit from // expressions. I think this is one thing your competing open source XML DB stresses in their docs: to always use as little parts in an XPath as possible.
I don't quite get how I would do incremental changes to the entries ordered by a key. I so an incremental update by just getting the updated pre values for the database that was changed. That is reasonably fast even with incremental attribute index update.
Just two ideas: You can store the data sets of your main database in a pre-sorted fashion. Incremental entries can be sorted on-the-fly in your query, and the results can then be merged with the sorted entries of the main database.
Document order matters to me so I can't sort the main DB. At least not in this dataset.
Another approach is to store the references and the index keys in your index database. The incremental entries can be merged with the sorted index entries (by looking at the index keys, which are available in both data structures).
I tried to store the _:d tags sorted by key ascending and descending once. That make 2.4 mio x keys (perhaps x 2) tags in the database. Writing this of course took much longer so a complete or initial index generation was up to five minutes. I think that is not worth it.
Efficiently merging by looking at the index keys is a problem because I join all the @xml:id that identify an entry into that one long @ids attribute. So I loose the relation between the key and the id. I did that because this was the fastest way to write this data to the db. Everything else I tried was much slower. And tokenize(@ids) is remarkably fast. Even if all 2.4 mio ids are in there this is really fast. Just writing out 2.4 mio ids to the database is slow.
... ! db:open-pre(./@db_name, ./@pre)
In BaseX 9.3, it will be possible to supply integer sequences as second argument; this may speed up your query a little.
I'll give it a try.
But I have to say some "get me all entries with ids starting with s800 sorted by some key" using this query
declare namespace _ = "https://www.oeaw.ac.at/acdh/tools/vle/util"; for $key in db:attribute("_qdb-TEI-02__cache", index:attributes("_qdb-TEI-02__cache", 's800'))[. instance of attribute(xml:id)] order by $key/../@vutlsk ascending where starts-with($key/../@xml:id, 's800') return db:open-pre($key/../@db_name, $key/../@pre)
only takes 140 ms for about 3900 entries. Unfortunately starts-with(@xml:id, 's800') is not optimized in such a way automatically.
Best regards
Omar Siam