Re: [basex-talk] BaseX XQuery vs. python / lxml performance

29 Mar 2012


      I'm sorry, the markup from copy and paste was a bit unexpected, so I
send it again.
Hi Jonannes, Charles and Michael,
at first thanks for your immediate readiness to help.
I will shortly present the structure of the database:
<Dataset>
    <Structure>
    	<Institute Name="Physik">	
    		<Degree Abbr="ABC" Name="ABC">
    			<Module Abbr="HIJ" Name="HIJ">
    				<!-- the Module nodes are arbirtrary nested in themselves -->
    			</Module>
    			<!-- more Module nodes -->
    		</Degree>
    		<!-- more Degree nodes -->
    	</Institute>
    	<!--more Institute nodes-->
    </Structure>
    <!-- other informations -->
    <Lessons>
    	<Lesson ID="12345">
    		<Name lang="de">Name of a Lesson</Name>
    		<AssociatedModules>
    			<Module Abbr="HIJ"/>
    			<Module Abbr="ABC"/>
    			<!-- there are 1..unbounded Modules per Lesson, only modules
containing no modules are referenced -->
    		</AssociatedModules>
    		<!-- othere informations -->
    	</Lesson>
    </Lessons>
</Dataset>
The task is now to create a list like that:
http://vlvz1.physik.hu-berlin.de/ss2012/physik/verzeichnis/en/, that is
the whole structure, but only with Modules, where are in fact associated
lessons.
The current query looks like this:
let $lang := data($ses/lang)
let $sem := data($ses/sem)
let $inst := data($ses/inst)
let $semxml := db:open("vlvz",concat($sem,'.xml'))
let $moduleswithlvs :=
distinct-values($semxml//AssociatedModules/Module/@Abbr)
return
<span>
<div class="struc">
{
for $degree in
$semxml//Institute[@Name=$ses//inst]/Degree[Modules//Module/@Abbr=$moduleswithlvs]
return <div class="indent"><span
class="degree">{data($degree/@Abbr)}&#x20;{data($degree/@Name)}<br/></span>
{
for $module in $degree/Modules//Module[(* and
*/@Abbr=$moduleswithlvs) or @Abbr=$moduleswithlvs]
let $leaf := not($module/*)
let $depth := functx:depth-of-node($module)-7
return
<div class="indent depth{$depth}">
{data($module/@Abbr)}&#x20;{data($module/@Name)}&#x20;<br/>
{
if ($leaf)
then
<div class="indent">
{
for $lesson in vlvz:getlvs($semxml,data($module/@Abbr))
return <div class="lesson"><span
class="lessonid">{$lesson/@ID}</span><span
class="lessonname">{$lesson/Name[@lang=$ses//lang]}</span><span
class="lessonmodules">{string-join($lesson/AssociatedModules/Module/@Abbr,',
')}</span></div>
<!-- note [1] -->
}
</div>
else ()
}
</div>
}
</div>
}
</div>
</span>
I noticed already, that [1] is crucial: This node makes running the
query about 10 times longer than with returning an empty sequence
There is no difference with respect to just returning <div></div>, its
as slow as with its content.
I should also mention the function vlvz:getlvs:
declare function vlvz:getlvs($semxml as node()*,$modabbr as xs:string)
as node()*
{
 for $l in $semxml//Lesson
 where $l[AssociatedModules/Module/@Abbr=$modabbr]
 order by data($l/@ID)
 return $l
};
That the queries are bad designed with respect to performance is
probably the case: Basicly all what I've done till know with XQuery was
just learning by doing.
Beste Grüße aus der Hauptstadt,
Ronny
On 03/29/2012 11:00 AM, Michael Seiferle wrote:
...
Hi Ronny,
Hi Johannes & Charles, thanks for joining the conversation.
In my opinion, and speaking officially for BaseX, I'd suppose that XML
processing with BaseX databases should almost always[1] be faster than
processing the XML sequentially via lxml.
However, performance may vary depending on the actual queries and/or the
python glue code.
I think Charles' approach of having as much logic in XQuery as possible
will be the best option to pick here.
Maybe some of your Python code could as well be rewritten in XQuery, on
the other hand this might not even be necessary due to XQuery rewrites
as Johannes suggested.
@Ronny, maybe you could provide us with some sample code? In case it is
not intended for the general public feel free to send it to
support@basex.org mailto:support@basex.org.
Looking forward to seeing your code!
Viele Grüße vom Bodensee
Michael
[1] I can sure think of examples that prove me wrong ;-)
Am 28.03.2012 um 23:19 schrieb Johannes.Lichtenberger:
...
Thus I suppose it
would be the best to write the queries in a reply, such that the BaseX
team can make suggestions for similar queries which better utilize
index-structures and the query optimizations from the query processor.

BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [basex-talk] BaseX XQuery vs. python / lxml performance