I'm trying to see if I can use baseX for a project we have. We need to store a large number of small documents (about 5,000,000 where each document is 1 to 10K). I had some performance issues and searched the mailing list and found some answers like this:
https://mailman.uni-konstanz.de/pipermail/basex-talk/2012-January/002478.htm...
This suggests I should be able to get good performance (query times that are around ~100ms or so). I'm running this on a linux server with fast disks and 24 GB of RAM (4 GB for JVM). By the way, I'm doing the queries through the baseX GUI... not sure if that makes any difference. I created 3 test databases, small, medium and large. The results are shown below. All databases have full text search disabled (because I don't need it) and "Path Summary", "Text Index", "Attribute index" enabled. It seems like the indexes are not doing anything or just not working, because the query times are going up linearly (up to 5 seconds for the large database!!) with the size of the database... can someone explain what is happening/why, and how I can fix it?
Thanks a lot, Shahin Roboubi Software Engineer MDA
Embedded Attachment:
-----------------------------------------------------------------------------------
Database Properties Name: radarsat2small Size: 97 MB Nodes: 3891930 Resources: 92665 Timestamp: 05.01.2012 15:07:37
Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<200]] Compiling: - adding text() step - rewriting orbit_number/text() < 200 Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 200.0]] Timing: - Parsing: 0.25 ms - Compiling: 0.37 ms - Evaluating: 530.21 ms - Printing: 5.09 ms - Total Time: 535.94 ms Result: - Results: 165 Items - Updated: 0 Items - Printed: 145 KB Query plan: <IterPath> <Root/> <IterStep axis="child" test="metadata"/> <IterStep axis="child" test="Radarsat2Signal"> <AxisPath> <IterStep axis="child" test="Acquisition"> <CmpR min="-INF" max="200"> <AxisPath> <IterStep axis="child" test="orbit_number"/> <IterStep axis="child" test="text()"/> </AxisPath> </CmpR> </IterStep> </AxisPath> </IterStep> </IterPath>
-----------------------------------------------------------------------------------
Database Properties Name: radarsat2medium Size: 194 MB Nodes: 7777056 Resources: 185168 Timestamp: 05.01.2012 15:26:07
Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<100]] Compiling: - adding text() step - rewriting orbit_number/text() < 100 Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 100.0]] Timing: - Parsing: 0.25 ms - Compiling: 0.56 ms - Evaluating: 1079.27 ms - Printing: 5.99 ms - Total Time: 1086.08 ms Result: - Results: 185 Items - Updated: 0 Items - Printed: 163 KB Query plan: <IterPath> <Root/> <IterStep axis="child" test="metadata"/> <IterStep axis="child" test="Radarsat2Signal"> <AxisPath> <IterStep axis="child" test="Acquisition"> <CmpR min="-INF" max="100"> <AxisPath> <IterStep axis="child" test="orbit_number"/> <IterStep axis="child" test="text()"/> </AxisPath> </CmpR> </IterStep> </AxisPath> </IterStep> </IterPath>
-----------------------------------------------------------------------------------
Database Properties Name: radarsat2large Size: 873 MB Nodes: 34999986 Resources: 833333 Timestamp: 05.01.2012 16:32:29
Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<20]] Compiling: - adding text() step - rewriting orbit_number/text() < 20 Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 20.0]] Timing: - Parsing: 0.28 ms - Compiling: 2.16 ms - Evaluating: 5296.87 ms - Printing: 5.71 ms - Total Time: 5305.04 ms Result: - Results: 174 Items - Updated: 0 Items - Printed: 153 KB Query plan: <IterPath> <Root/> <IterStep axis="child" test="metadata"/> <IterStep axis="child" test="Radarsat2Signal"> <AxisPath> <IterStep axis="child" test="Acquisition"> <CmpR min="-INF" max="20"> <AxisPath> <IterStep axis="child" test="orbit_number"/> <IterStep axis="child" test="text()"/> </AxisPath> </CmpR> </IterStep> </AxisPath> </IterStep> </IterPath>