Hi community,
I have created a DB of around 7 GB documents in BaseX (100 first records of Medline) with full text index option enabled (index size 1 GB).
I test it locally on my laptop. The problem is that I observe great variations in the free text query times from 8s up to couple of minutes. Main memory is underused (allocated 2 GB) and the processor has quite low utilization. I observe high usage of the hard disk. I have read that the whole db could be loaded in the main memory, but this is not an option for an 100 GB db collection that I would like to have.
My question is if there is a possibility to load only the index in the main memory in order to accelerate the process of searching. Any other performance tips are very welcome.
Thank you, George Aravanis
Hi George,
I’m not sure if your queries are using the full-text at all, as a minimum of 8 seconds doesn’t sound that thrilling. Could you please check the output in the InfoView panel in the GUI (or the command-line output created via -V) and see what’s written there? Next, feel free to pass us on one of the queries you have tried.
Best, Christian ___________________________
2013/9/29 Aravanis, G. g.aravanis@student.tue.nl:
Hi community,
I have created a DB of around 7 GB documents in BaseX (100 first records of Medline) with full text index option enabled (index size 1 GB).
I test it locally on my laptop. The problem is that I observe great variations in the free text query times from 8s up to couple of minutes. Main memory is underused (allocated 2 GB) and the processor has quite low utilization. I observe high usage of the hard disk. I have read that the whole db could be loaded in the main memory, but this is not an option for an 100 GB db collection that I would like to have.
My question is if there is a possibility to load only the index in the main memory in order to accelerate the process of searching. Any other performance tips are very welcome.
Thank you, George Aravanis _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christian,
I have tried many queries and a typical example of what I get is the following:
Total Time: 21492.91 ms
Compiling: - simplifying descendant-or-self step(s) - applying full-text index
Query: //AbstractText[text() contains text 'KINASES']
Optimized Query: db:fulltext("1-100un", "KINASES")/parent::*:AbstractText
Result: - Hit(s): 167 Items - Updated: 0 Items - Printed: 191 KB - Read Locking: local [1-100un] - Write Locking: none
Timing: - Parsing: 0.0 ms - Compiling: 0.27 ms - Evaluating: 21324.7 ms - Printing: 167.93 ms - Total Time: 21492.91 ms
Query plan: <QueryPlan> <CachedPath> <FTIndexAccess data="1-100un"> <FTWords> <Str value="KINASES" type="xs:string"/> </FTWords> </FTIndexAccess> <IterStep axis="parent" test="*:AbstractText"/> </CachedPath> </QueryPlan>
Regards, George ________________________________________ Από: Christian Grün [christian.gruen@gmail.com] Στάλθηκε: Δευτέρα, 30 Σεπτεμβρίου 2013 12:20 μμ Προς: Aravanis, G. Κοιν.: basex-talk@mailman.uni-konstanz.de Θέμα: Re: [basex-talk] Keep index in main memory-performance
Hi George,
I’m not sure if your queries are using the full-text at all, as a minimum of 8 seconds doesn’t sound that thrilling. Could you please check the output in the InfoView panel in the GUI (or the command-line output created via -V) and see what’s written there? Next, feel free to pass us on one of the queries you have tried.
Best, Christian ___________________________
2013/9/29 Aravanis, G. g.aravanis@student.tue.nl:
Hi community,
I have created a DB of around 7 GB documents in BaseX (100 first records of Medline) with full text index option enabled (index size 1 GB).
I test it locally on my laptop. The problem is that I observe great variations in the free text query times from 8s up to couple of minutes. Main memory is underused (allocated 2 GB) and the processor has quite low utilization. I observe high usage of the hard disk. I have read that the whole db could be loaded in the main memory, but this is not an option for an 100 GB db collection that I would like to have.
My question is if there is a possibility to load only the index in the main memory in order to accelerate the process of searching. Any other performance tips are very welcome.
Thank you, George Aravanis _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Aravanis,
thanks for the details. Yes, it’s obvious that the index is used in your query. Out of interest, I’ve created a full-text index from 10 medline files (~800mb). I’ve listed the results below. Do you get similar performance for 10% of the data? I may try a larger medline portion in a second test.
Next, I would be interested what results you get when using Java’s default memory allocation (i.e., if you start BaseX without -Xmx, or if you assign a small value such as 64 or 256m). As BaseX relies on the caching patterns of your operating system, it could be that the memory assigned to BaseX cannot be utilized by the OS.
Best, Christian
Compiling: - simplifying descendant-or-self step(s) - applying full-text index
Query: //AbstractText[text() contains text 'KINASES']
Optimized Query: db:fulltext("medline", "KINASES")/parent::*:AbstractText
Result: - Hit(s): 84 Items - Updated: 0 Items - Printed: 104 KB - Read Locking: local [medline] - Write Locking: none
Timing: - Parsing: 0.0 ms - Compiling: 0.16 ms - Evaluating: 9.27 ms - Printing: 11.53 ms - Total Time: 20.98 ms ___________________________
2013/9/30 Aravanis, G. g.aravanis@student.tue.nl:
Hi Christian,
I have tried many queries and a typical example of what I get is the following:
Total Time: 21492.91 ms
Compiling:
- simplifying descendant-or-self step(s)
- applying full-text index
Query: //AbstractText[text() contains text 'KINASES']
Optimized Query: db:fulltext("1-100un", "KINASES")/parent::*:AbstractText
Result:
- Hit(s): 167 Items
- Updated: 0 Items
- Printed: 191 KB
- Read Locking: local [1-100un]
- Write Locking: none
Timing:
- Parsing: 0.0 ms
- Compiling: 0.27 ms
- Evaluating: 21324.7 ms
- Printing: 167.93 ms
- Total Time: 21492.91 ms
Query plan:
<QueryPlan> <CachedPath> <FTIndexAccess data="1-100un"> <FTWords> <Str value="KINASES" type="xs:string"/> </FTWords> </FTIndexAccess> <IterStep axis="parent" test="*:AbstractText"/> </CachedPath> </QueryPlan>
Regards, George ________________________________________ Από: Christian Grün [christian.gruen@gmail.com] Στάλθηκε: Δευτέρα, 30 Σεπτεμβρίου 2013 12:20 μμ Προς: Aravanis, G. Κοιν.: basex-talk@mailman.uni-konstanz.de Θέμα: Re: [basex-talk] Keep index in main memory-performance
Hi George,
I’m not sure if your queries are using the full-text at all, as a minimum of 8 seconds doesn’t sound that thrilling. Could you please check the output in the InfoView panel in the GUI (or the command-line output created via -V) and see what’s written there? Next, feel free to pass us on one of the queries you have tried.
Best, Christian ___________________________
2013/9/29 Aravanis, G. g.aravanis@student.tue.nl:
Hi community,
I have created a DB of around 7 GB documents in BaseX (100 first records of Medline) with full text index option enabled (index size 1 GB).
I test it locally on my laptop. The problem is that I observe great variations in the free text query times from 8s up to couple of minutes. Main memory is underused (allocated 2 GB) and the processor has quite low utilization. I observe high usage of the hard disk. I have read that the whole db could be loaded in the main memory, but this is not an option for an 100 GB db collection that I would like to have.
My question is if there is a possibility to load only the index in the main memory in order to accelerate the process of searching. Any other performance tips are very welcome.
Thank you, George Aravanis _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Thank you for your time and interest. I tried also with 10 records as well (91-100) and I got similar results for that keyword 20ms. I tried also other keywords such as "cancer" with more hits and it took up to 15 seconds. For "spirographic" which had only 23 hits it took 1 ms. I used default memory with that.
________________________________________ Από: Christian Grün [christian.gruen@gmail.com] Στάλθηκε: Δευτέρα, 30 Σεπτεμβρίου 2013 3:45 μμ Προς: Aravanis, G. Κοιν.: basex-talk@mailman.uni-konstanz.de Θέμα: Re: [basex-talk] ΑΠ: Keep index in main memory-performance
Hi Aravanis,
thanks for the details. Yes, it’s obvious that the index is used in your query. Out of interest, I’ve created a full-text index from 10 medline files (~800mb). I’ve listed the results below. Do you get similar performance for 10% of the data? I may try a larger medline portion in a second test.
Next, I would be interested what results you get when using Java’s default memory allocation (i.e., if you start BaseX without -Xmx, or if you assign a small value such as 64 or 256m). As BaseX relies on the caching patterns of your operating system, it could be that the memory assigned to BaseX cannot be utilized by the OS.
Best, Christian
Compiling: - simplifying descendant-or-self step(s) - applying full-text index
Query: //AbstractText[text() contains text 'KINASES']
Optimized Query: db:fulltext("medline", "KINASES")/parent::*:AbstractText
Result: - Hit(s): 84 Items - Updated: 0 Items - Printed: 104 KB - Read Locking: local [medline] - Write Locking: none
Timing: - Parsing: 0.0 ms - Compiling: 0.16 ms - Evaluating: 9.27 ms - Printing: 11.53 ms - Total Time: 20.98 ms ___________________________
2013/9/30 Aravanis, G. g.aravanis@student.tue.nl:
Hi Christian,
I have tried many queries and a typical example of what I get is the following:
Total Time: 21492.91 ms
Compiling:
- simplifying descendant-or-self step(s)
- applying full-text index
Query: //AbstractText[text() contains text 'KINASES']
Optimized Query: db:fulltext("1-100un", "KINASES")/parent::*:AbstractText
Result:
- Hit(s): 167 Items
- Updated: 0 Items
- Printed: 191 KB
- Read Locking: local [1-100un]
- Write Locking: none
Timing:
- Parsing: 0.0 ms
- Compiling: 0.27 ms
- Evaluating: 21324.7 ms
- Printing: 167.93 ms
- Total Time: 21492.91 ms
Query plan:
<QueryPlan> <CachedPath> <FTIndexAccess data="1-100un"> <FTWords> <Str value="KINASES" type="xs:string"/> </FTWords> </FTIndexAccess> <IterStep axis="parent" test="*:AbstractText"/> </CachedPath> </QueryPlan>
Regards, George ________________________________________ Από: Christian Grün [christian.gruen@gmail.com] Στάλθηκε: Δευτέρα, 30 Σεπτεμβρίου 2013 12:20 μμ Προς: Aravanis, G. Κοιν.: basex-talk@mailman.uni-konstanz.de Θέμα: Re: [basex-talk] Keep index in main memory-performance
Hi George,
I’m not sure if your queries are using the full-text at all, as a minimum of 8 seconds doesn’t sound that thrilling. Could you please check the output in the InfoView panel in the GUI (or the command-line output created via -V) and see what’s written there? Next, feel free to pass us on one of the queries you have tried.
Best, Christian ___________________________
2013/9/29 Aravanis, G. g.aravanis@student.tue.nl:
Hi community,
I have created a DB of around 7 GB documents in BaseX (100 first records of Medline) with full text index option enabled (index size 1 GB).
I test it locally on my laptop. The problem is that I observe great variations in the free text query times from 8s up to couple of minutes. Main memory is underused (allocated 2 GB) and the processor has quite low utilization. I observe high usage of the hard disk. I have read that the whole db could be loaded in the main memory, but this is not an option for an 100 GB db collection that I would like to have.
My question is if there is a possibility to load only the index in the main memory in order to accelerate the process of searching. Any other performance tips are very welcome.
Thank you, George Aravanis _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de