Hi Martin,

The queries you shared terminate quickly and don’t return anything. The one provided by Graydon does and completes significantly faster. I’m not sure what accounts for the difference.

Thanks,
Ron

On March 10, 2021 at 3:44:28 PM, Martin Honnen (martin.honnen@gmx.de) wrote:

On 10.03.2021 21:37, Ron Katriel wrote:
> Hi,
>
> I would appreciate your advice on optimizing a query against a large
> BaseX (9.2.4) database. It is loaded with data from the FDA’s Adverse
> Event Reporting System (FAERS). Currently this is just the 2020 dataset
> which comprises 12 documents stored as 308,870,597 nodes (6,565 MB).
>
> The queries below effectively - though not necessarily efficiently -
> implement a histogram. The first, which is applied to patient gender
> (sex), returns the results (3 items) in 52 seconds:
>
> 2 893694
> 1 583999
> 0 198
>
> The second does this for patient weight - rounded to the closest 10 lbs
> increment. It takes 580 seconds to place the data into 67 bins.
> Initially I tried running it on the rounded weights but aborted the run
> as it was a taking an inordinate amount of time (there are 217 distinct
> weights in the dataset).
>
> Is there a way to improve the performance of this type of query?
>
> Thanks,
> Ron
>
>
> (: 3 items - 52 sec :)
> let $safetyreport := db:open('FAERS')/ichicsr/safetyreport
> for $value in distinct-values($safetyreport/patient/patientsex)
> return concat($value, " ",
> count(index-of($safetyreport/patient/patientsex, $value)))

You could always try whether grouping performs better e.g.

for $ps in db:open('FAERS')/ichicsr/safetyreport/patientsex
group by $s := $ps
return $s || " " || count($ps)

>
> (: 67 items - 580 sec :)
> let $safetyreport := db:open('FAERS')/ichicsr/safetyreport
> for $value in distinct-values($safetyreport/patient/patientweight ! (.
> div 10.0) ! round(.) ! (. * 10))
> return concat($value, " ",
> count(index-of($safetyreport/patient/patientweight ! (. div 10.0) !
> round(.) ! (. * 10), $value)))



for $pw in db:open('FAERS')/ichicsr/safetyreport/patientweight
group by $w := $pw ! (. div 10.0) ! round(.) ! (. * 10))
return $w || " " || count($pw)



The information in this email and any attachments are intended solely for the recipient(s) to whom it is addressed, and may be confidential and/or privileged. Any unauthorized distribution or copying of this transmittal or its attachments is prohibited. If you are not a named recipient or have received this email in error: (i) you should not read, disclose, or copy it, (ii) please notify the sender of your receipt by reply email and delete this email and all attachments.