Hi,
Sorry for all the questions of late first of all...
I've been using basex for a while now and wrote some nice xquery that gather metrics from datasets ~7000 files querying the whole corpus to create statistics and it quite fast at that but were thinking this query below feels slow at 34 seconds so we thought we should ask for you thoughts on it's duration.
Query: let $content := db:open('F-DDEX')//MessageHeader/MessageThreadId[text() eq '8937478'] return $content
Content: So every file has a message header with a MessageThreadId.
<MessageHeader xmlns:ern="http://ddex.net/xml/2011/ern-main/33" xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance"> <MessageThreadId>8937478</MessageThreadId> <MessageId>C2C977FDFDHF98DHF9D8FHEURYX</MessageId> <MessageSender> <PartyId>PA47F93H54HU93HJSFDINF</PartyId> <PartyName> <FullName>Warner Music Group</FullName> </PartyName> </MessageSender> <MessageRecipient> <PartyId>3G3E</PartyId> <PartyName> <FullName>3G3E-YADS</FullName> </PartyName> </MessageRecipient> <MessageCreatedDateTime>2012-06-18T05:35:54Z</MessageCreatedDateTime> </MessageHeader>
Given the following database with text, attribute and fulltext indexes on.
Database Properties Name: F-DDEX Size: 5251 MB Nodes: 239945615 Documents: 7954
Specs Server:
$ cat /proc/meminfo | grep MemTotal MemTotal: 7633876 kB
$ cat /proc/cpuinfo | grep name model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
Thoughts?
Thanks
Hi Alex,
let $content := db:open('F-DDEX')//MessageHeader/MessageThreadId[text() eq '8937478'] return $content
due to some specifics of the XQuery and XML semantics, the "eq" operator cannot also be rewritten for index access. The following query should be processed much faster:
let $content := db:open('F-DDEX')//MessageHeader/ MessageThreadId[text() = '8937478'] return $content
If that's not the case, you could try to remove the double slash before MessageHeader (usually, it should be optimized by the compiler anyway). If that doesn't help either, you could prefix all element names with namespace prefixes..
let $content := db:open('F-DDEX')/*:MessageHeader/ *:MessageThreadId[text() = '8937478'] return $content
If processing is still too slow, please provide us with a dump of the InfoView (or the verbose command line output triggered with -V).
Christian _____________________________________
<MessageHeader xmlns:ern="http://ddex.net/xml/2011/ern-main/33" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <MessageThreadId>8937478</MessageThreadId> <MessageId>C2C977FDFDHF98DHF9D8FHEURYX</MessageId>
<MessageSender> <PartyId>PA47F93H54HU93HJSFDINF</PartyId> <PartyName> <FullName>Warner Music Group</FullName> </PartyName> </MessageSender> <MessageRecipient> <PartyId>3G3E</PartyId> <PartyName> <FullName>3G3E-YADS</FullName> </PartyName> </MessageRecipient> <MessageCreatedDateTime>2012-06-18T05:35:54Z</MessageCreatedDateTime> </MessageHeader>
Given the following database with text, attribute and fulltext indexes on.
Database Properties Name: F-DDEX Size: 5251 MB Nodes: 239945615 Documents: 7954
Specs Server:
$ cat /proc/meminfo | grep MemTotal MemTotal: 7633876 kB
$ cat /proc/cpuinfo | grep name model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
Thoughts?
Thanks
--
Alex G. Muir Software Engineering Consultant Linkedin Profile : http://ca.linkedin.com/pub/alex-muir/36/ab7/125 Love African Kora Music? Take a moment to listen to Gambia's - Amadu Diabarte & Jali Bakary Konteh www.bafila.bandcamp.com Your support keeps Africa's griot tradition alive... Cheers!
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christian,
Thanks that = version runs in 0.4 seconds
This one returns an empty sequence however..
let $content := db:open('F-DDEX')/*:
MessageHeader/ *:MessageThreadId[text() = '8937478'] return $content
Thanks Alex
On Thu, Nov 29, 2012 at 4:12 PM, Christian Grün christian.gruen@gmail.comwrote:
Hi Alex,
let $content := db:open('F-DDEX')//MessageHeader/MessageThreadId[text()
eq
'8937478'] return $content
due to some specifics of the XQuery and XML semantics, the "eq" operator cannot also be rewritten for index access. The following query should be processed much faster:
let $content := db:open('F-DDEX')//MessageHeader/ MessageThreadId[text() = '8937478'] return $content
If that's not the case, you could try to remove the double slash before MessageHeader (usually, it should be optimized by the compiler anyway). If that doesn't help either, you could prefix all element names with namespace prefixes..
let $content := db:open('F-DDEX')/*:MessageHeader/ *:MessageThreadId[text() = '8937478'] return $content
If processing is still too slow, please provide us with a dump of the InfoView (or the verbose command line output triggered with -V).
Christian _____________________________________
<MessageHeader xmlns:ern="http://ddex.net/xml/2011/ern-main/33" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <MessageThreadId>8937478</MessageThreadId> <MessageId>C2C977FDFDHF98DHF9D8FHEURYX</MessageId>
<MessageSender> <PartyId>PA47F93H54HU93HJSFDINF</PartyId> <PartyName> <FullName>Warner Music Group</FullName> </PartyName> </MessageSender> <MessageRecipient> <PartyId>3G3E</PartyId> <PartyName> <FullName>3G3E-YADS</FullName> </PartyName> </MessageRecipient> <MessageCreatedDateTime>2012-06-18T05:35:54Z</MessageCreatedDateTime> </MessageHeader>
Given the following database with text, attribute and fulltext indexes
on.
Database Properties Name: F-DDEX Size: 5251 MB Nodes: 239945615 Documents: 7954
Specs Server:
$ cat /proc/meminfo | grep MemTotal MemTotal: 7633876 kB
$ cat /proc/cpuinfo | grep name model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
Thoughts?
Thanks
--
Alex G. Muir Software Engineering Consultant Linkedin Profile : http://ca.linkedin.com/pub/alex-muir/36/ab7/125 Love African Kora Music? Take a moment to listen to Gambia's - Amadu Diabarte & Jali Bakary Konteh www.bafila.bandcamp.com Your support keeps Africa's griot tradition alive... Cheers!
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de