I’m trying to see if I can use baseX for a project we have. We need to store a large number of small documents (about 5,000,000 where each document is 1 to 10K). I had some performance issues and searched the mailing list and found some answers like this:
https://mailman.uni-konstanz.de/pipermail/basex-talk/2012-January/002478.html
This suggests I should be able to get good performance (query times that are around ~100ms or so). I’m running this on a linux server with fast disks and 24 GB of RAM (4 GB for JVM). By the way, I’m doing the queries through the baseX GUI… not sure if that makes any difference.
I created 3 test databases, small, medium and large. The results are shown below. All databases have full text search disabled (because I don’t need it) and “Path Summary”, “Text Index”, “Attribute index” enabled. It seems like the indexes are not doing anything or just not working, because the query times are going up linearly (up to 5 seconds for the large database!!) with the size of the database… can someone explain what is happening/why, and how I can fix it?
Thanks a lot,
Shahin Roboubi
Software Engineer
MDA
Embedded Attachment:
-----------------------------------------------------------------------------------
Database Properties
Name: radarsat2small
Size: 97 MB
Nodes: 3891930
Resources: 92665
Timestamp: 05.01.2012 15:07:37
Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<200]]
Compiling:
- adding text() step
- rewriting orbit_number/text() < 200
Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 200.0]]
Timing:
- Parsing: 0.25 ms
- Compiling: 0.37 ms
- Evaluating: 530.21 ms
- Printing: 5.09 ms
- Total Time: 535.94 ms
Result:
- Results: 165 Items
- Updated: 0 Items
- Printed: 145 KB
Query plan:
<IterPath>
<Root/>
<IterStep axis="child" test="metadata"/>
<IterStep axis="child" test="Radarsat2Signal">
<AxisPath>
<IterStep axis="child" test="Acquisition">
<CmpR min="-INF" max="200">
<AxisPath>
<IterStep axis="child" test="orbit_number"/>
<IterStep axis="child" test="text()"/>
</AxisPath>
</CmpR>
</IterStep>
</AxisPath>
</IterStep>
</IterPath>
-----------------------------------------------------------------------------------
Database Properties
Name: radarsat2medium
Size: 194 MB
Nodes: 7777056
Resources: 185168
Timestamp: 05.01.2012 15:26:07
Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<100]]
Compiling:
- adding text() step
- rewriting orbit_number/text() < 100
Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 100.0]]
Timing:
- Parsing: 0.25 ms
- Compiling: 0.56 ms
- Evaluating: 1079.27 ms
- Printing: 5.99 ms
- Total Time: 1086.08 ms
Result:
- Results: 185 Items
- Updated: 0 Items
- Printed: 163 KB
Query plan:
<IterPath>
<Root/>
<IterStep axis="child" test="metadata"/>
<IterStep axis="child" test="Radarsat2Signal">
<AxisPath>
<IterStep axis="child" test="Acquisition">
<CmpR min="-INF" max="100">
<AxisPath>
<IterStep axis="child" test="orbit_number"/>
<IterStep axis="child" test="text()"/>
</AxisPath>
</CmpR>
</IterStep>
</AxisPath>
</IterStep>
</IterPath>
-----------------------------------------------------------------------------------
Database Properties
Name: radarsat2large
Size: 873 MB
Nodes: 34999986
Resources: 833333
Timestamp: 05.01.2012 16:32:29
Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<20]]
Compiling:
- adding text() step
- rewriting orbit_number/text() < 20
Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 20.0]]
Timing:
- Parsing: 0.28 ms
- Compiling: 2.16 ms
- Evaluating: 5296.87 ms
- Printing: 5.71 ms
- Total Time: 5305.04 ms
Result:
- Results: 174 Items
- Updated: 0 Items
- Printed: 153 KB
Query plan:
<IterPath>
<Root/>
<IterStep axis="child" test="metadata"/>
<IterStep axis="child" test="Radarsat2Signal">
<AxisPath>
<IterStep axis="child" test="Acquisition">
<CmpR min="-INF" max="20">
<AxisPath>
<IterStep axis="child" test="orbit_number"/>
<IterStep axis="child" test="text()"/>
</AxisPath>
</CmpR>
</IterStep>
</AxisPath>
</IterStep>
</IterPath>