I’m trying to see if I can use baseX for a project we have. We need to store a large number of small documents (about 5,000,000 where each document is 1 to 10K). I had some performance issues and searched the mailing list and found some answers like this:

 

https://mailman.uni-konstanz.de/pipermail/basex-talk/2012-January/002478.html

 

This suggests I should be able to get good performance (query times that are around ~100ms or so). I’m running this on a linux server with fast disks and 24 GB of RAM (4 GB for JVM). By the way, I’m doing the queries through the baseX GUI… not sure if that makes any difference.

I created 3 test databases, small, medium and large. The results are shown below. All databases have full text search disabled (because I don’t need it) and “Path Summary”, “Text Index”, “Attribute index” enabled. It seems like the indexes are not doing anything or just not working, because the query times are going up linearly (up to 5 seconds for the large database!!) with the size of the database… can someone explain what is happening/why, and how I can fix it?

 

Thanks a lot,

Shahin Roboubi
Software Engineer
MDA

Embedded Attachment:

 

-----------------------------------------------------------------------------------

 

Database Properties

Name: radarsat2small

Size: 97 MB

Nodes: 3891930

Resources: 92665

Timestamp: 05.01.2012 15:07:37

 

Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<200]]

Compiling:

- adding text() step

- rewriting orbit_number/text() < 200

Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 200.0]]

Timing:

- Parsing:  0.25 ms

- Compiling:  0.37 ms

- Evaluating:  530.21 ms

- Printing:  5.09 ms

- Total Time:  535.94 ms

Result:

- Results: 165 Items

- Updated: 0 Items

- Printed: 145 KB

Query plan:

<IterPath>

  <Root/>

  <IterStep axis="child" test="metadata"/>

  <IterStep axis="child" test="Radarsat2Signal">

    <AxisPath>

      <IterStep axis="child" test="Acquisition">

        <CmpR min="-INF" max="200">

          <AxisPath>

            <IterStep axis="child" test="orbit_number"/>

            <IterStep axis="child" test="text()"/>

          </AxisPath>

        </CmpR>

      </IterStep>

    </AxisPath>

  </IterStep>

</IterPath>

 

-----------------------------------------------------------------------------------

 

Database Properties

Name: radarsat2medium

Size: 194 MB

Nodes: 7777056

Resources: 185168

Timestamp: 05.01.2012 15:26:07

 

Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<100]]

Compiling:

- adding text() step

- rewriting orbit_number/text() < 100

Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 100.0]]

Timing:

- Parsing:  0.25 ms

- Compiling:  0.56 ms

- Evaluating:  1079.27 ms

- Printing:  5.99 ms

- Total Time:  1086.08 ms

Result:

- Results: 185 Items

- Updated: 0 Items

- Printed: 163 KB

Query plan:

<IterPath>

  <Root/>

  <IterStep axis="child" test="metadata"/>

  <IterStep axis="child" test="Radarsat2Signal">

    <AxisPath>

      <IterStep axis="child" test="Acquisition">

        <CmpR min="-INF" max="100">

          <AxisPath>

            <IterStep axis="child" test="orbit_number"/>

            <IterStep axis="child" test="text()"/>

          </AxisPath>

        </CmpR>

      </IterStep>

    </AxisPath>

  </IterStep>

</IterPath>

 

-----------------------------------------------------------------------------------

 

Database Properties

Name: radarsat2large

Size: 873 MB

Nodes: 34999986

Resources: 833333

Timestamp: 05.01.2012 16:32:29

 

Query: /metadata/Radarsat2Signal[Acquisition[orbit_number<20]]

Compiling:

- adding text() step

- rewriting orbit_number/text() < 20

Result: root()/metadata/Radarsat2Signal[Acquisition[orbit_number/text() < 20.0]]

Timing:

- Parsing:  0.28 ms

- Compiling:  2.16 ms

- Evaluating:  5296.87 ms

- Printing:  5.71 ms

- Total Time:  5305.04 ms

Result:

- Results: 174 Items

- Updated: 0 Items

- Printed: 153 KB

Query plan:

<IterPath>

  <Root/>

  <IterStep axis="child" test="metadata"/>

  <IterStep axis="child" test="Radarsat2Signal">

    <AxisPath>

      <IterStep axis="child" test="Acquisition">

        <CmpR min="-INF" max="20">

          <AxisPath>

            <IterStep axis="child" test="orbit_number"/>

            <IterStep axis="child" test="text()"/>

          </AxisPath>

        </CmpR>

      </IterStep>

    </AxisPath>

  </IterStep>

</IterPath>