Hi,
I have a custom index that looks like this (one db, different files):
<_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util" db_name="z881_qdb-TEI-02n" order="none"> <_:d pre="15627" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e2" vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/> <_:d pre="15673" db_name="z881_qdb-TEI-02n" xml:id="z881_qdbn-d16e21" vutlsk="tsįttr Ziter [Subst]" vutlsk-archiv="HK 881, z8810118.sch#1"/> ... </_:dryed> <_:dryed xmlns:_="https://www.oeaw.ac.at/acdh/tools/vle/util" db_name="f227_qdb-TEI-02n" order="none"> <_:d pre="467" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29398" vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#944.1 = fare0126.eck#1.1"/> <_:d pre="591" db_name="f227_qdb-TEI-02n" xml:id="f237_qdb-d1e29438" vutlsk="(aus)faren [Verb]" vutlsk-archiv="HK 327, f227#945.1 = fare0126.eck#2.1"/> ... </_:dryed>
There are about 2.4 Mio _:d tags in this db.
I need to sort them by the @vutlsk* attributes alphabetically in ascending and descending order.
With the code I have now:
declare namespace _ = "https://www.oeaw.ac.at/acdh/tools/vle/util";
let $sorted-ascending := subsequence(for $d in collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d order by $d/@vutlsk ascending return $d/(@ID, @xml:id)/data(), 1, 10000) let $sorted-descending := subsequence(for $d in collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d order by $d/@vutlsk descending return $d/(@ID, @xml:id)/data(), 1, 10000) let $sorted-ascending-archiv := subsequence(for $d in collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d order by $d/@vutlsk-archiv ascending return $d/(@ID, @xml:id)/data(), 1, 10000) let $sorted-descending-archiv := subsequence(for $d in collection('_qdb-TEI-02__cache')//*[@order="none"]/_:d order by $d/@vutlsk-archiv descending return $d/(@ID, @xml:id)/data(), 1, 10000) return (db:replace("_qdb-TEI-02__cache", 'ascending_cache.xml', <_:dryed order="ascending" ids="{string-join($sorted-ascending, ' ')}"/>), db:replace("_qdb-TEI-02__cache", 'descending_cache.xml', <_:dryed order="descending" ids="{string-join($sorted-descending, ' ')}"/>), db:replace("_qdb-TEI-02__cache", 'ascending-archiv_cache.xml', <_:dryed order="ascending" label="archiv" ids="{string-join($sorted-ascending-archiv, ' ')}"/>), db:replace("_qdb-TEI-02__cache", 'descending-archiv_cache.xml', <_:dryed order="descending" label="archiv" ids="{string-join($sorted-descending-archiv, ' ')}"/>))
This takes 30 s to about a minute depending on the subsequence I choose.
I did experiments with doing multithreading and not. Multiple jobs or fork-join make it worse.
Worst case I need to do it every time I save a change to the original DBs for which I maintain that index.
Any ideas how to speed this up?
Best regards
Omar Siam