Hello all, I'm using BaseX to cluster a set of millions of small XML fragments which look something like this:
<affiliation> <organization>Institut für Organische Chemie der Universität Heidelberg</organization> <country iso-code="DEU"/> </affiliation>
I need to cluster based on fragment similarity - so taking into account elements, attributes and text nodes.
If I use the entire XML fragment as a grouping key, something like this:
for $a at $c in db:open('DB')/item/*/affiliation group by $val := $a
... then will the grouping be equivalent to the functionality of the deep-equal function? First results seem to suggest this, but I want to make sure that grouping is not done on text node value alone or anything like that.
Incidentally, BaseX is simply unbelievably fast at executing this - a million fragments clustered and written out to another DB in 16 seconds on a laptop. My congratulations on an amazing product.
Regards, Constantine
________________________________
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
Hi Constantine,
Incidentally, BaseX is simply unbelievably fast at executing this – a million fragments clustered and written out to another DB in 16 seconds on a laptop. My congratulations on an amazing product.
Thanks!
If I use the entire XML fragment as a grouping key, something like this: [...] … then will the grouping be equivalent to the functionality of the deep-equal function?
A grouping keys is the atomized value of a grouping variable [1,2]. If this value is prone to be ambiguous, you can create an arbitrary other value, e.g. as follows:
group by $val := string-join($a/*, '; ')
Cheers, Christian
[1] http://docs.basex.org/wiki/XQuery_3.0#group_by [2] http://www.w3.org/TR/xquery-30/#id-group-by
Hi Christian,
I should have read the spec more fully. So clearly I will need to create a custom grouping key as the atomization process (fn:data) basically passes over attribute values:
let $val := <affiliation> <organization>Institut für Organische Chemie der Universität Heidelberg</organization> <country iso-code="DEU"/> </affiliation>
return data($val)
=> results in "Institut für Organische Chemie der Universität Heidelberg" as a grouping key.
Thanks for pointing me in the right direction.
C.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 09 January 2016 18:49 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] group-by behaviour for clustering XML fragments
Hi Constantine,
Incidentally, BaseX is simply unbelievably fast at executing this – a million fragments clustered and written out to another DB in 16 seconds on a laptop. My congratulations on an amazing product.
Thanks!
If I use the entire XML fragment as a grouping key, something like this: [...] … then will the grouping be equivalent to the functionality of the deep-equal function?
A grouping keys is the atomized value of a grouping variable [1,2]. If this value is prone to be ambiguous, you can create an arbitrary other value, e.g. as follows:
group by $val := string-join($a/*, '; ')
Cheers, Christian
[1] http://docs.basex.org/wiki/XQuery_3.0#group_by [2] http://www.w3.org/TR/xquery-30/#id-group-by
________________________________
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
Please remove me from this list please be done the unsubscribe 3times now thank you Sent from Yahoo Mail on Android
On Mon, Jan 11, 2016 at 4:25 AM, Hondros, Constantine (ELS-AMS)C.Hondros@elsevier.com wrote: Hi Christian,
I should have read the spec more fully. So clearly I will need to create a custom grouping key as the atomization process (fn:data) basically passes over attribute values:
let $val := <affiliation> <organization>Institut für Organische Chemie der Universität Heidelberg</organization> <country iso-code="DEU"/> </affiliation>
return data($val)
=> results in "Institut für Organische Chemie der Universität Heidelberg" as a grouping key.
Thanks for pointing me in the right direction.
C.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 09 January 2016 18:49 To: Hondros, Constantine (ELS-AMS) Cc: basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] group-by behaviour for clustering XML fragments
Hi Constantine,
Incidentally, BaseX is simply unbelievably fast at executing this – a million fragments clustered and written out to another DB in 16 seconds on a laptop. My congratulations on an amazing product.
Thanks!
If I use the entire XML fragment as a grouping key, something like this: [...] … then will the grouping be equivalent to the functionality of the deep-equal function?
A grouping keys is the atomized value of a grouping variable [1,2]. If this value is prone to be ambiguous, you can create an arbitrary other value, e.g. as follows:
group by $val := string-join($a/*, '; ')
Cheers, Christian
[1] http://docs.basex.org/wiki/XQuery_3.0#group_by [2] http://www.w3.org/TR/xquery-30/#id-group-by
________________________________
Elsevier B.V. Registered Office: Radarweg 29, 1043 NX Amsterdam, The Netherlands, Registration No. 33156677, Registered in The Netherlands.
basex-talk@mailman.uni-konstanz.de