Hi Matt,

Since I've started using BaseX, I've had great experiences with map performance. Would this version work better, where you first create a map version for all items in both databases, and next look up the corresponding keys?

  let $originalDB := db:get('OriginalDB')/data/item
  let $updateDB := db:get('UpdateDB')/data/item

  let $originalDB.map := map:merge(
    for $entry in $originalDB
    return map:entry($entry/@id, $entry)
  )
  let $updateDB.map := map:merge(
    for $entry in $updateDB
    return map:entry($entry/@id, $entry)
  )
  for $id in ($originalDB.map => map:keys(), $updateDB.map => map:keys()) => distinct-values()
  return json:serialize(map {
    'id': xs:long($id),
    'original': $originalDB.map($id),
    'update': $updateDB.map($id)
  }, map { 'indent': 'no' })

Best,

Ron


On 3/06/2025 23:09, Matt Dziuban wrote:
Hi all,

I'm working on a query that reads from two databases -- OriginalDB and UpdateDB. The structure of each database is the same:

<data>
  <item id="1">...</item>
  <item id="2">...</item>
  ...
</data>

OriginalDB has 2,305,570 items and UpdateDB has 307,019.

The query gets all distinct item ids from the two databases, then gets the item with a matching id (if there is one) from each database, and returns the id and both nodes in a JSON object. Here's the query:

let $originalDB := db:get('OriginalDB')/data/item
let $updateDB := db:get('UpdateDB')/data/item
for $id in distinct-values(($originalDB/@id, $updateDB/@id))
let $original := $originalDB[@id = $id]
let $update := $updateDB[@id = $id]
return json:serialize(map {
  'id': xs:long($id),
  'original': $original,
  'update': $update
}, map { 'indent': 'no' })

In its current state this query is very slow -- it's returned only 35 JSON objects in ~30 minutes. How can I go about optimizing it to best take advantage of each database's indices?

Thanks in advance,
Matt