Hello,
I constructed the following XML file for another test of the software “BaseX 9.7”.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <test_data> <info> <id>12</id> <topics> <topic>Demo1</topic> <topic>Demo2</topic> </topics> </info> <info> <id>23</id> <topics> <topic>Demo1</topic> <topic>Demo2</topic> </topics> </info> <info> <id>34</id> <topics> <topic>Test1</topic> <topic>Test2</topic> <topic>Test3</topic> </topics> </info> <info> <id>45</id> <topics> <topic>Test1</topic> <topic>Test2</topic> <topic>Test3</topic> </topics> </info> <info> <id>56</id> <topics> <topic>Test1</topic> <topic>Test2</topic> <topic>Test3</topic> </topics> </info> <info> <id>67</id> <topics> <topic>Probe1</topic> </topics> </info> </test_data>
I tried the following XQuery script out accordingly.
declare option output:method "csv"; declare option output:csv "header=yes, separator=|"; for $x in //test_data/info group by $topics := string-join($x/topics/topic/data(), "*") let $incidence := count($topics) order by $incidence descending return <csv> <record> <topic_combination>{$topics}</topic_combination> <incidence>{$incidence}</incidence> </record> </csv>
Corresponding test result:
topic_combination|incidence Demo1*Demo2|1 Test1*Test2*Test3|1 Probe1|1
I would like to see the numbers “2” and “3” instead at the end of two rows for such a data analysis approach. I would appreciate further advices for this use case.
Regards, Markus
If the following result is the one you would expect …
topic_combination|incidence Test1*Test2*Test3|3 Demo1*Demo2|2 Probe1|1
… it is sufficient to replace …
let $incidence := count($topics)
… by …
let $incidence := count($x)
The string join yields a single item, which is assigned to $topics; thus, count($topics) returns 1. Grouped values will be assigned to the variables that have been declared before the 'group by' clause. This means that count($x) returns …
• 1 if it’s called before 'group by' • the number of grouped items if it’s called after 'group by'
basex-talk@mailman.uni-konstanz.de