Hi Gary,
I UPDINDEX is used, different data structures are internally applied, which makes it difficult to switch or deactivate this mode.
Regarding updates, it makes no difference if you apply all update operations in a FLWOR expression or in a single call. The main reason why updates with UPDINDEX activated are taking quite some is: for each value that's being updated, the id lists containing back references to the XML nodes must be updated. If the number of distinct values to be updated is small, your queries should be processed a lot faster.
Best, Christian ___________________________
Can the UPDINDEX property be turned off and on after the database has been created? I know your docs say it needs to be set before you create it, but is this still the case and what is the reasoning behind this?
Another related question, when you update attributes in a FLOWR expression as I do in my date conversion function, should the updates be applied one by one, or all in one go? i.e. should each update trigger the index to be updated, or should the index only be updated after the FLOWR expression completes (which would probably be more efficient)?
Cheers Gary
From: Christian Grün christian.gruen@gmail.com To: The Trainspotter wys01@btinternet.com Cc: "basex-talk@mailman.uni-konstanz.de" basex-talk@mailman.uni-konstanz.de Sent: Saturday, 10 November 2012, 21:12 Subject: Re: [basex-talk] UPDINDEX performance problem
Hi Gary,
thanks for your report. Using more than one database is indeed a viable, and recommended, approach. If your data fits into memory, you could use a single XQuery and the functions provided by the Database Module [1] in order to directly store the converted data in a second database.
Reports on other approaches out there are welcome!
Best, Christian
[1] http://docs.basex.org/wiki/Database_Module ___________________________
Just thought I'd share a problem I've got and to see if I'm solving it in the most efficient way.
One of the XML documents I process contains various attributes which contain dates in non xs:date forat, e.g. "2010/1/20", "2010/12/1". In order to use the various date functions in XQuery I first need to bulk change all of these attribute values so they're in xs:date format.
What I've found is that if I set the UPDINDEX property before I create the database, the XQuery function I have to change the date formats takes an age to complete. If I don't set it then the function completes pretty quickly. I need to use the UPDINDEX setting as I do a lot of updates and queries and I don't want to manually have to keep the indexes up to date.
The solution I have at the moment is to create two databases, the first doesn't have UPDINDEX set, I load the document, do the date conversion, then export to a temporary file. I then create a second database from the temporary file, this time with UPDINDEX set. Is there a better/more efficient way of doing this?
My date conversion function is this:
declare updating function ts:convertToXsDate() { for $e in /*:NML/*:COLLECTION/*:ENTRY/@MODIFIED_DATE union /*:NML/*:COLLECTION/*:ENTRY/INFO/@RELEASE_DATE union /*:NML/*:COLLECTION/*:ENTRY/INFO/@LAST_PLAYED union /*:NML/*:COLLECTION/*:ENTRY/INFO/@IMPORT_DATE let $dateParts:= tokenize( data($e), "/") let $year:= $dateParts[1] let $month:= if( string-length($dateParts[2]) = 1) then concat("0",$dateParts[2]) else $dateParts[2] let $day:= if( string-length($dateParts[3]) = 1) then concat("0",$dateParts[3]) else $dateParts[3] return try { replace value of node $e with xs:date(concat($year,"-",$month,"-",$day)) } catch FORG0001 { () } };
Thanks in advance, Gary
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk