I have an optimized database with 25K objects, 8 GB of data running on 9.1.1.
WebDAV operations are very slow (client running and connecting locally). The response times are by a magnitude slower compared to exist-db. Is there some something missing in the configuration in order to speed this up significantly?
Andreas
11:28:16.899 127.0.0.1:45822 admin OK Database 'onkopedia' was optimized in 29379.65 ms. 29380.24 ms 11:28:24.689 0:0:0:0:0:0:0:1:34258 admin REQUEST [HEAD] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:25.382 0:0:0:0:0:0:0:1:34258 admin 200 694.07 ms 11:28:25.389 0:0:0:0:0:0:0:1:34260 admin REQUEST [PROPFIND] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:26.065 0:0:0:0:0:0:0:1:34260 admin 200 675.31 ms 11:28:26.078 0:0:0:0:0:0:0:1:34262 admin REQUEST [HEAD] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:26.770 0:0:0:0:0:0:0:1:34262 admin 200 691.53 ms 11:28:26.789 0:0:0:0:0:0:0:1:34264 admin REQUEST [PROPFIND] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:29.231 0:0:0:0:0:0:0:1:34264 admin 200 2444.93 ms 11:28:29.240 0:0:0:0:0:0:0:1:34266 admin REQUEST [HEAD] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:29.856 0:0:0:0:0:0:0:1:34266 admin 200 616.41 ms 11:28:29.864 0:0:0:0:0:0:0:1:34268 admin REQUEST [PROPFIND] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:32.310 0:0:0:0:0:0:0:1:34268 admin 200 2445.99 ms 11:28:32.317 0:0:0:0:0:0:0:1:34270 admin REQUEST [HEAD] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:33.083 0:0:0:0:0:0:0:1:34270 admin 200 766.14 ms 11:28:33.103 0:0:0:0:0:0:0:1:34272 admin REQUEST [GET] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:33.719 0:0:0:0:0:0:0:1:34272 admin 200 615.38 ms
The original tests below have been made on a virtualized machine with 8 cores, 64 GB RAM. My first idea was about the shared storage, filesystem. So I tried with the same dataset on my local machine and the result is even worse: 1700ms for PROPFIND on the root of the database containing only on object in the root:
12:10:26.294 127.0.0.1:53202 admin REQUEST OPTIMIZE ALL 2.37 ms 12:10:41.396 127.0.0.1:53202 admin OK Database 'onkopedia' was optimized in 15102.34 ms. 15102.77 ms 12:10:42.646 127.0.0.1:35556 admin REQUEST [PROPFIND] http://localhost:8984/webdav/onkopedia/ 12:10:44.346 127.0.0.1:35556 admin 200 1700.02 ms
Andreas
On 4 Jan 2019, at 11:37, Andreas Jung wrote:
I have an optimized database with 25K objects, 8 GB of data running on 9.1.1.
WebDAV operations are very slow (client running and connecting locally). The response times are by a magnitude slower compared to exist-db. Is there some something missing in the configuration in order to speed this up significantly?
Andreas
11:28:16.899 127.0.0.1:45822 admin OK Database 'onkopedia' was optimized in 29379.65 ms. 29380.24 ms 11:28:24.689 0:0:0:0:0:0:0:1:34258 admin REQUEST [HEAD] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:25.382 0:0:0:0:0:0:0:1:34258 admin 200 694.07 ms 11:28:25.389 0:0:0:0:0:0:0:1:34260 admin REQUEST [PROPFIND] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:26.065 0:0:0:0:0:0:0:1:34260 admin 200 675.31 ms 11:28:26.078 0:0:0:0:0:0:0:1:34262 admin REQUEST [HEAD] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:26.770 0:0:0:0:0:0:0:1:34262 admin 200 691.53 ms 11:28:26.789 0:0:0:0:0:0:0:1:34264 admin REQUEST [PROPFIND] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:29.231 0:0:0:0:0:0:0:1:34264 admin 200 2444.93 ms 11:28:29.240 0:0:0:0:0:0:0:1:34266 admin REQUEST [HEAD] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:29.856 0:0:0:0:0:0:0:1:34266 admin 200 616.41 ms 11:28:29.864 0:0:0:0:0:0:0:1:34268 admin REQUEST [PROPFIND] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:32.310 0:0:0:0:0:0:0:1:34268 admin 200 2445.99 ms 11:28:32.317 0:0:0:0:0:0:0:1:34270 admin REQUEST [HEAD] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:33.083 0:0:0:0:0:0:0:1:34270 admin 200 766.14 ms 11:28:33.103 0:0:0:0:0:0:0:1:34272 admin REQUEST [GET] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:33.719 0:0:0:0:0:0:0:1:34272 admin 200 615.38 ms
I re-verified the behavior on a third (fast) machine with SDD and the result as still bad with an average of 900-1000ms per request.
Andreas
On 4 Jan 2019, at 12:13, Andreas Jung wrote:
The original tests below have been made on a virtualized machine with 8 cores, 64 GB RAM. My first idea was about the shared storage, filesystem. So I tried with the same dataset on my local machine and the result is even worse: 1700ms for PROPFIND on the root of the database containing only on object in the root:
12:10:26.294 127.0.0.1:53202 admin REQUEST OPTIMIZE ALL 2.37 ms 12:10:41.396 127.0.0.1:53202 admin OK Database 'onkopedia' was optimized in 15102.34 ms. 15102.77 ms 12:10:42.646 127.0.0.1:35556 admin REQUEST [PROPFIND] http://localhost:8984/webdav/onkopedia/ 12:10:44.346 127.0.0.1:35556 admin 200 1700.02 ms
Andreas
On 4 Jan 2019, at 11:37, Andreas Jung wrote:
I have an optimized database with 25K objects, 8 GB of data running on 9.1.1.
WebDAV operations are very slow (client running and connecting locally). The response times are by a magnitude slower compared to exist-db. Is there some something missing in the configuration in order to speed this up significantly?
Andreas
11:28:16.899 127.0.0.1:45822 admin OK Database 'onkopedia' was optimized in 29379.65 ms. 29380.24 ms 11:28:24.689 0:0:0:0:0:0:0:1:34258 admin REQUEST [HEAD] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:25.382 0:0:0:0:0:0:0:1:34258 admin 200 694.07 ms 11:28:25.389 0:0:0:0:0:0:0:1:34260 admin REQUEST [PROPFIND] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:26.065 0:0:0:0:0:0:0:1:34260 admin 200 675.31 ms 11:28:26.078 0:0:0:0:0:0:0:1:34262 admin REQUEST [HEAD] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:26.770 0:0:0:0:0:0:0:1:34262 admin 200 691.53 ms 11:28:26.789 0:0:0:0:0:0:0:1:34264 admin REQUEST [PROPFIND] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:29.231 0:0:0:0:0:0:0:1:34264 admin 200 2444.93 ms 11:28:29.240 0:0:0:0:0:0:0:1:34266 admin REQUEST [HEAD] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:29.856 0:0:0:0:0:0:0:1:34266 admin 200 616.41 ms 11:28:29.864 0:0:0:0:0:0:0:1:34268 admin REQUEST [PROPFIND] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:32.310 0:0:0:0:0:0:0:1:34268 admin 200 2445.99 ms 11:28:32.317 0:0:0:0:0:0:0:1:34270 admin REQUEST [HEAD] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:33.083 0:0:0:0:0:0:0:1:34270 admin 200 766.14 ms 11:28:33.103 0:0:0:0:0:0:0:1:34272 admin REQUEST [GET] http://localhost:8984/webdav/onkopedia/onkopedia/de/onkopedia/guidelines/mam... 11:28:33.719 0:0:0:0:0:0:0:1:34272 admin 200 615.38 ms
The WebDAV implementation is definitely not the one that provides best performance in BaseX. Nevertheless, I think I’ll have a look at the use case you describe. Some questions in return:
I re-verified the behavior on a third (fast) machine with SDD and the result as still bad with an average of 900-1000ms per request.
1. What does "request" mean in that context? Do you request directory entries, do you request documents, or do you run updates? 2. Does the eXist-db WebDAV implementation list multiple databases in the WebDAV root directory (as BaseX does), or do all operations work on a specific database that is chosen at startup time?
On 7 Jan 2019, at 11:20, Christian Grün wrote:
The WebDAV implementation is definitely not the one that provides best performance in BaseX. Nevertheless, I think I’ll have a look at the use case you describe. Some questions in return:
I re-verified the behavior on a third (fast) machine with SDD and the result as still bad with an average of 900-1000ms per request.
- What does "request" mean in that context? Do you request directory
entries, do you request documents, or do you run updates?
Please check the log entries..there are only HEAD/PROFIND/GET requests on the WebDAV level against a single XML resource.
- Does the eXist-db WebDAV implementation list multiple databases in
the WebDAV root directory (as BaseX does), or do all operations work on a specific database that is chosen at startup time?
eXist basically serves only one database per database server instance. But how is eXist relevant here?
Andreas
Please check the log entries..there are only HEAD/PROFIND/GET requests on the WebDAV level against a single XML resource.
I just created a collection with 120,000 documents. The time for retrieving a single resource was about 100 ms; but it took much longer indeed to request entries of the root directory. I’ll see if I can so something about this.
In general, you’ll definitely get better performance when using one of our other APIs.
eXist basically serves only one database per database server instance.
But how is eXist relevant here?
It my be relevant because both eXist-db and BaseX are based on the same Milton library (but maybe a different version?). I experienced bottlenecks with this library in the past, so we once thought about writing our own WebDAV library (because the protocol is using XML anyway. It’s interesting to hear that you are getting better performance with eXist. One reason might be that eXist keeps an index of all files of the chosen database in main-memory while BaseX needs to scan the corresponding database every time it is accessed.
If you are only working with a single database, you can try to open this database at startup time:
basexhhtp -c "open your-db"
However, I will check our WebDAV implementation and give you some more feedback.
On 9 Jan 2019, at 20:39, Christian Grün wrote:
Please check the log entries..there are only HEAD/PROFIND/GET requests on the WebDAV level against a single XML resource.
I just created a collection with 120,000 documents. The time for retrieving a single resource was about 100 ms; but it took much longer indeed to request entries of the root directory. I’ll see if I can so something about this.
In general, you’ll definitely get better performance when using one of our other APIs.
See my first mail…also requests to other arbitrary documents are slow (in the range of course 700ms and higher). And there is only one test db involved here. We did not choose a vendor specific API because we want to have data portability and a unified API. By doing this we were able to move from eXist to Marklogic to BaseX to Filesystem storage without changing the access layer of our applications. Only a few specific scripts for querying the database had to be adjusted.
eXist basically serves only one database per database server instance.
But how is eXist relevant here?
It my be relevant because both eXist-db and BaseX are based on the same Milton library (but maybe a different version?). I experienced bottlenecks with this library in the past, so we once thought about writing our own WebDAV library (because the protocol is using XML anyway. It’s interesting to hear that you are getting better performance with eXist. One reason might be that eXist keeps an index of all files of the chosen database in main-memory while BaseX needs to scan the corresponding database every time it is accessed.
If you are only working with a single database, you can try to open this database at startup time:
basexhhtp -c "open your-db"
However, I will check our WebDAV implementation and give you some more feedback.
See my first mail…also requests to other arbitrary documents are slow (in the range of course 700ms and higher).
Well. That’s why I said I’ll have another look at this. Feel free to send us a link to your data, this might simplify the analysis.
We did not choose
a vendor specific API because we want to have data portability and a unified API.
Sounds completely reasonable.
Please note that developers regularly experience a lot of surprises and challenges when switching from one vendor to another, as XML databases have much more idiosyncrasies than relational databases (performance of APIs is just one of them). We try to minimize the challenge (e.g. by developing joint EXPath and EXQuery modules with other vendors), but of course this is often difficult if vendors have already chosen a specific solution.
Would you like to share your experiences with MarkLogic and WebDAV with us?
Well. That’s why I said I’ll have another look at this. Feel free to send us a link to your data, this might simplify the analysis.
I have done some tests with larger databases (with up to 1 mio documents), and I have found a way to speed up the directory traversal via WebDAV. It MAY also affect the targeted retrieval of single resources. A new snapshot is available [1]. I’d appreciate if you could give me some feedback on the questions in my last mail; this might help us to help you (I’m still not sure which operations in your database instance are slow and which are fast, as I couldn’t reproduce this on my test database instances).
@everyone: The new convenience function db:dir was added [1]. It is utilized by our WebDAV implementation, and it can be used to list all resources and sub folders of a database directory. Database directories in BaseX are all virtual: They implicitly result from the paths of stored resources.
Cheers, Christian
[1] http://files.basex.org/releases/latest/ [2] http://docs.basex.org/wiki/Database_Module#db:dir
On Thu, Jan 10, 2019 at 7:19 AM Christian Grün christian.gruen@gmail.com wrote:
See my first mail…also requests to other arbitrary documents are slow (in the range of course 700ms and higher).
Well. That’s why I said I’ll have another look at this. Feel free to send us a link to your data, this might simplify the analysis.
We did not choose a vendor specific API because we want to have data portability and a unified API.
Sounds completely reasonable.
Please note that developers regularly experience a lot of surprises and challenges when switching from one vendor to another, as XML databases have much more idiosyncrasies than relational databases (performance of APIs is just one of them). We try to minimize the challenge (e.g. by developing joint EXPath and EXQuery modules with other vendors), but of course this is often difficult if vendors have already chosen a specific solution.
Would you like to share your experiences with MarkLogic and WebDAV with us?
basex-talk@mailman.uni-konstanz.de