Hi,
Im using BaseX to compare two XML documents with a list of items <asset> identified by an element <assetid> and with a
Now I need to consolidate them in only one document with the value of the <category> element in both documents (one of them renamed to <prev-category>).
Attached are samples of these documents., and the quey Im using is this:
declare function local:compare( $actual, $prev ){
for $c in $actual/asset
return <asset>
{ $c/*}
<prev-category>
{ $prev/asset[assetid = $c/assetid]/category/text() }
</prev-category>
</asset>
};
declare variable $act external := "2017-07";
declare variable $prv external := "2016-07";
return local:compare(/portfolio[projectid=$act], /portfolio[projectid=$prv] )
The query takes 42 seconds when run over the files. If I load the files to a database and activate the text index, it takes 125 seconds.
This is a very common query in relational databases that takes only few seconds to run joining two tables or two subqueries.
Is there a way to make it run faster in BaseX?
Thanks for your advice,
William David Velásquez
Creativo de Software
Creativos Digitales S.A.S.
Calle 30A # 83 - 53 Local 1033
Tel: 322 1730 - 311 709 8421
Medellín, Colombia
Hi William,
Your query will be evaluated in appr. 100-200 ms if you do some little rewritings. Here is one variant:
declare function local:compare($actual, $prev) { for $c in $actual/portfolio/asset return <asset> { $c/* }, <prev-category>{ $prev/portfolio/asset[assetid/text() = $c/assetid/text()]/category/text() }</prev-category> </asset> };
declare variable $act external := "2017-07"; declare variable $prv external := "2016-07";
(: external resources :) local:compare(doc($act || '.xml'), doc($prv || '.xml'))
(: external resources :) local:compare(db:open($db, $act || '.xml'), db:open($db, $prv || '.xml'))
Some notes:
• It is always advisable to directly address text nodes in your query: assetid → assetid/text(). The reason is that an element node may contain multiple text nodes, which would need to be concatenated for a text comparison. The text index in BaseX, however, works on single text nodes, which means that a path expression will only be rewritten for index access if it can be derived statically that the target node will be a single text.
• If you call a function, the body is more likely to be rewritten for index access if you pass on the whole document instead of child or descendant nodes: /portfolio → doc(...).
If you open the "Info" panel in the GUI, you can look for the string "apply text index" in order to ensure that indexes will be utilized.
Hope this helps, Christian
On Sun, Feb 4, 2018 at 3:03 AM, wd@creativosdigitales.co wrote:
Hi,
I’m using BaseX to compare two XML documents with a list of items <asset> identified by an element <assetid> and with a
Now I need to consolidate them in only one document with the value of the <category> element in both documents (one of them renamed to <prev-category>).
Attached are samples of these documents., and the quey I’m using is this:
declare function local:compare( $actual, $prev ){
for $c in $actual/asset
return <asset> { $c/*} <prev-category> { $prev/asset[assetid = $c/assetid]/category/text() } </prev-category> </asset>
};
declare variable $act external := "2017-07";
declare variable $prv external := "2016-07";
return local:compare(/portfolio[projectid=$act], /portfolio[projectid=$prv] )
The query takes 42 seconds when run over the files. If I load the files to a database and activate the text index, it takes 125 seconds.
This is a very common query in relational databases that takes only ssfew seconds to run joining two tables or two subqueries.
Is there a way to make it run faster in BaseX?
Thanks for your advice,
William David Velásquez
Creativo de Software
Creativos Digitales S.A.S.
Calle 30A # 83 - 53 Local 1033
Tel: 322 1730 - 311 709 8421
Medellín, Colombia
Thank you very much!
Using your advice about performance I solved not only this, but some other ugly queries here.
Cheers,
WILLIAM DAVID VELÁSQUEZ
CREATIVO DE SOFTWARE
Creativos Digitales S.A.S.
Calle 30A # 83 - 53 Local 1033
Tel: 322 1730 - 311 709 8421
Medellín, Colombia
http://creativosdigitales.co [1]
El 2018-02-05 01:52, Christian Grün escribió:
Hi William,
Your query will be evaluated in appr. 100-200 ms if you do some little rewritings. Here is one variant:
declare function local:compare($actual, $prev) { for $c in $actual/portfolio/asset return <asset> { $c/* }, <prev-category>{ $prev/portfolio/asset[assetid/text() = $c/assetid/text()]/category/text() }</prev-category>
</asset> };
declare variable $act external := "2017-07"; declare variable $prv external := "2016-07";
(: external resources :) local:compare(doc($act || '.xml'), doc($prv || '.xml'))
(: external resources :) local:compare(db:open($db, $act || '.xml'), db:open($db, $prv || '.xml'))
Some notes:
- It is always advisable to directly address text nodes in your query:
assetid → assetid/text(). The reason is that an element node may contain multiple text nodes, which would need to be concatenated for a text comparison. The text index in BaseX, however, works on single text nodes, which means that a path expression will only be rewritten for index access if it can be derived statically that the target node will be a single text.
- If you call a function, the body is more likely to be rewritten for
index access if you pass on the whole document instead of child or descendant nodes: /portfolio → doc(...).
If you open the "Info" panel in the GUI, you can look for the string "apply text index" in order to ensure that indexes will be utilized.
Hope this helps, Christian
On Sun, Feb 4, 2018 at 3:03 AM, wd@creativosdigitales.co wrote:
Hi,
I'm using BaseX to compare two XML documents with a list of items <asset> identified by an element <assetid> and with a
Now I need to consolidate them in only one document with the value of the <category> element in both documents (one of them renamed to <prev-category>).
Attached are samples of these documents., and the quey I'm using is this:
declare function local:compare( $actual, $prev ){
for $c in $actual/asset
return <asset>
{ $c/*}
<prev-category>
{ $prev/asset[assetid = $c/assetid]/category/text() }
</prev-category>
</asset>
};
declare variable $act external := "2017-07";
declare variable $prv external := "2016-07";
return local:compare(/portfolio[projectid=$act], /portfolio[projectid=$prv] )
The query takes 42 seconds when run over the files. If I load the files to a database and activate the text index, it takes 125 seconds.
This is a very common query in relational databases that takes only ssfew seconds to run joining two tables or two subqueries.
Is there a way to make it run faster in BaseX?
Thanks for your advice,
William David Velásquez
Creativo de Software
Creativos Digitales S.A.S.
Calle 30A # 83 - 53 Local 1033
Tel: 322 1730 - 311 709 8421
Medellín, Colombia
Links: ------ [1] http://creativosdigitales.co/
basex-talk@mailman.uni-konstanz.de