Thanks to Graydon, Tamara, and Christian for responding!
I figured out a pretty fast way to exploit the infrastructure I had built (the files allocated out into many databases and a single index database generated from the databases).
Here is a sample record from my index database:
<entry>
<dbname>pmed_updates_b</dbname>
<pmid>34239076</pmid>
<version>1</version>
<path>pubmed22n1145.xml</path>
<date_revised>2022-01-09</date_revised>
</entry>
As it happens, there are eight versions of this record scattered across 7 of the component databases and located in 8 input files (two of the input files were allocated to one of the databases). Each of these instances has an entry in the index database.
My approach has four steps:
- retrieve all entries from the index database that have the desired PMID;
- convert the sequence of XML entries into a sequence of maps with the same data, ordering by filename descending, so that the most recent file is the first element of the sequence;
- take the first item/map of the sequence;
- look up all occurrences of records with that PMID in the database specified in the first item and call db:path() on each item and compare it to the filename specified in the most recent record; the record whose db:path() matches the item/map taken in step three is the most recent version of the record with that PMID.
Files are allocated by modulo to the different databases, so it is conceivable that a database will have more than one record with a given PMID, hence the necessity of comparing each record's path with the one given in the map from step three to determine which is the most recent.