I'm sorry, the markup from copy and paste was a bit unexpected, so I send it again.
Hi Jonannes, Charles and Michael,
at first thanks for your immediate readiness to help.
I will shortly present the structure of the database:
<Dataset> <Structure> <Institute Name="Physik"> <Degree Abbr="ABC" Name="ABC"> <Module Abbr="HIJ" Name="HIJ"> <!-- the Module nodes are arbirtrary nested in themselves --> </Module> <!-- more Module nodes --> </Degree> <!-- more Degree nodes --> </Institute> <!--more Institute nodes--> </Structure> <!-- other informations --> <Lessons> <Lesson ID="12345"> <Name lang="de">Name of a Lesson</Name> <AssociatedModules> <Module Abbr="HIJ"/> <Module Abbr="ABC"/> <!-- there are 1..unbounded Modules per Lesson, only modules containing no modules are referenced --> </AssociatedModules> <!-- othere informations --> </Lesson> </Lessons> </Dataset>
The task is now to create a list like that: http://vlvz1.physik.hu-berlin.de/ss2012/physik/verzeichnis/en/, that is the whole structure, but only with Modules, where are in fact associated lessons.
The current query looks like this:
let $lang := data($ses/lang) let $sem := data($ses/sem) let $inst := data($ses/inst) let $semxml := db:open("vlvz",concat($sem,'.xml')) let $moduleswithlvs := distinct-values($semxml//AssociatedModules/Module/@Abbr) return <span> <div class="struc"> { for $degree in $semxml//Institute[@Name=$ses//inst]/Degree[Modules//Module/@Abbr=$moduleswithlvs] return <div class="indent"><span class="degree">{data($degree/@Abbr)} {data($degree/@Name)}<br/></span> { for $module in $degree/Modules//Module[(* and */@Abbr=$moduleswithlvs) or @Abbr=$moduleswithlvs] let $leaf := not($module/*) let $depth := functx:depth-of-node($module)-7 return <div class="indent depth{$depth}"> {data($module/@Abbr)} {data($module/@Name)} <br/> { if ($leaf) then <div class="indent"> { for $lesson in vlvz:getlvs($semxml,data($module/@Abbr)) return <div class="lesson"><span class="lessonid">{$lesson/@ID}</span><span class="lessonname">{$lesson/Name[@lang=$ses//lang]}</span><span class="lessonmodules">{string-join($lesson/AssociatedModules/Module/@Abbr,', ')}</span></div> <!-- note [1] --> } </div> else () } </div> } </div> } </div> </span>
I noticed already, that [1] is crucial: This node makes running the query about 10 times longer than with returning an empty sequence There is no difference with respect to just returning <div></div>, its as slow as with its content. I should also mention the function vlvz:getlvs:
declare function vlvz:getlvs($semxml as node()*,$modabbr as xs:string) as node()* { for $l in $semxml//Lesson where $l[AssociatedModules/Module/@Abbr=$modabbr] order by data($l/@ID) return $l };
That the queries are bad designed with respect to performance is probably the case: Basicly all what I've done till know with XQuery was just learning by doing.
Beste Grüße aus der Hauptstadt, Ronny
On 03/29/2012 11:00 AM, Michael Seiferle wrote:
Hi Ronny,
Hi Johannes & Charles, thanks for joining the conversation.
In my opinion, and speaking officially for BaseX, I'd suppose that XML processing with BaseX databases should almost always[1] be faster than processing the XML sequentially via lxml.
However, performance may vary depending on the actual queries and/or the python glue code.
I think Charles' approach of having as much logic in XQuery as possible will be the best option to pick here. Maybe some of your Python code could as well be rewritten in XQuery, on the other hand this might not even be necessary due to XQuery rewrites as Johannes suggested.
@Ronny, maybe you could provide us with some sample code? In case it is not intended for the general public feel free to send it to support@basex.org mailto:support@basex.org.
Looking forward to seeing your code!
Viele Grüße vom Bodensee
Michael
[1] I can sure think of examples that prove me wrong ;-) Am 28.03.2012 um 23:19 schrieb Johannes.Lichtenberger:
Thus I suppose it would be the best to write the queries in a reply, such that the BaseX team can make suggestions for similar queries which better utilize index-structures and the query optimizations from the query processor.
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk