Hi Dirk,
Thx for the quick response. Autooptimize is turned off as you can see below [1]. I can reproduce the behavior with a single curl XPUT to the large mets database. Every PUT generates ~25MB disk io. With multiple clients I think this is my bottleneck.
However, my document Ids are unique, therefore I hash the Ids and use the first two chars as a database identifier. With this I can spread the documents over 256 databases.
a) Is this a sensible approach?
b) The performance significantly increased with multiple dbs. But what is going to happen when the database sizes reach the size of my initial database? Is there a practical limit on the database size/nodes/documents ?
[1]
root@koala:~# docker exec -it basex basex
BaseX 8.4.1 [Standalone]
Try 'help' to get more information.
open mets
Database 'mets' was opened in 988.78 ms.
get AUTOOPTIMIZE
AUTOOPTIMIZE: false
Viele Grüße aus Niedersachsen
Björn Braunschweig
Von: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] Im Auftrag von Dirk Kirsten Gesendet: Mittwoch, 24. August 2016 15:03 An: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] REST API performance, massive disk IO
Hello Björn,
I guess you use up to 64 parallel clients for performance reasons, i.e. to increase throughput? However, if you write indeed to a single database and all of these queries are updating you are in fact decreasing performance. Locks are applied per database in BaseX, i.e. one write process locks the database. So having multiple parallel clients will result in some overhead for managing the clients and similar stuff.
So, in turn to actually achieve a performance benefit to will have to spread your documents over multiple databases. As you can request data within one XQuery from an unlimited number of databases, this should be possible. You could put some kind of load-balancing in front of the updating function to selected a new database.
Regarding the number of open/write calls it could be a problem with updating indexes. Do you maybe have the AUTOOPTIMIZE http://docs.basex.org/wiki/Options#AUTOOPTIMIZE) turned on. If so, the database will be optimized and the index updated after each update which takes significantly longer time, especially in your use case. I would recommend you turn it off and manually optimize the databas(es) after all document creations have occurred.
Viele Grüße vom Bodensee,
Dirk
On 08/24/2016 01:20 PM, Braunschweig, Björn wrote:
Hello everybody,
Im using the basexhttp docker image:
root@koala:/tmp# docker images | grep basexhttp
basex/basexhttp 8.4.1 54d1e0001449 3 months ago 628.1 MB
On ubuntu 14.04 [1] and the python client library: https://github.com/lucalianas/pyBaseX https://github.com/lucalianas/pyBaseX
I have a performance problem with the setup. The database gets filled up by up to 64 parallel clients using the REST api. Currently the database has a size of 2GB, 74899811 Nodes, 305756 Documents. For every document I add, I can see a lot of disk I/O on my host with up to 80MB/s [2] by the org.basex.BaseXHTTP class. The size of the xml documents cannot be the cause, they are < 10KB. The corresponding response time of the api is in the range of 10-30 seconds [3]. Furthermore the basex docker container log shows a HTTPException and a strange Creating Database... for every request [4]. strace of the basex process shows massive open/write calls [5].
What am I doing wrong? :-)
Thx, Björn Braunschweig
[1]
root@koala:~# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.5 LTS"
root@koala:~# uname -a
Linux koala 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
[2] accumulated disk io after a couple of seconds
root@koala:/tmp# iotop -a -P
Total DISK READ : 3.88 K/s | Total DISK WRITE : 53.39 M/s
Actual DISK READ: 3.88 K/s | Actual DISK WRITE: 52.84 M/s
PID PRIO USER DISK READ DISK WRITE> SWAPIN IO COMMAND
62061 be/4 1984 8.00 K 2.20 G 0.00 % 0.01 % java -cp /usr/src/basex/basex-api/target~.2.jar: -Xmx2048m org.basex.BaseXHTTP d
[3] my application log: 18 seconds till response
2016-08-15 14:52:00,337 | dmp043 | 46912496392256 | INFO | mdqi | add_document | 241 | saving document: urn:diasid:ast:kop01:0201502181219395
340000/mets.xml to db: mets
2016-08-15 14:52:18,164 | dmp043 | 46912496392256 | INFO | mdqi | add_document | 243 | RESPONSE (status code 201): 0 resource(s) replaced in
62.67 ms.
[5] docker logs basex
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: _ REQUEST _________________________________
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: (PUT /rest/mets/urn:diasid:ast:kop01:0201502150401114120000/mets.xml)@843076472 mailto:org.eclipse.jetty.server.Request@32405378 org.eclipse.jetty.server.Request@32405378
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - Host: koala.gwdg.de:8984
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - Accept: */*
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - Authorization: Basic xxx
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - Content-Length: 7558
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - User-Agent: python-requests/2.10.0
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - Connection: keep-alive
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: - Accept-Encoding: gzip, deflate
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: _ RESPONSE ________________________________
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: HTTP/1.1 201
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: Content-Type: text/plain;charset=UTF-8#015
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: #015
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: Creating Database...
Aug 12 13:44:21 koala docker/basex/ead004c4ce42[1167]: . 1.98 ms (458 MB)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: org.basex.http.HTTPException: 0 resource(s) replaced in 52.02 ms.
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]:
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.basex.http.HTTPCode.get(HTTPCode.java:57)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.basex.http.rest.RESTServlet.run(RESTServlet.java:31)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.basex.http.BaseXServlet.service(BaseXServlet.java:64)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137 )
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java :231)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java :1086)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:429)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java: 193)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java: 1020)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135 )
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1 16)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.Server.handle(Server.java:370)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpCo nnection.java:494)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnecti on.java:982)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(Abstr actHttpConnection.java:1043)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:236)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java :82)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint. java:696)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.j ava:53)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java: 608)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:5 43)
Aug 12 13:44:22 koala docker/basex/ead004c4ce42[1167]: #011at java.lang.Thread.run(Thread.java:745)
[4]
root@koala:/tmp# strace -t -f -p 9675 -e trace=open -o /tmp/trace.log
root@koala:/tmp#
root@koala:/tmp# head trace.log
38813 14:23:12 open("/srv/BaseXData/mets/pth.basex", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 129
38797 14:23:12 open("/srv/BaseXData/mets/inf.basex", O_RDONLY) = 76
38797 14:23:12 open("/srv/BaseXData/mets/tbli.basex", O_RDONLY) = 76
38797 14:23:12 open("/srv/BaseXData/mets/tbl.basex", O_RDWR|O_CREAT, 0666) = 76
38797 14:23:12 open("/srv/BaseXData/mets/txt.basex", O_RDWR|O_CREAT, 0666) = 121
38797 14:23:12 open("/srv/BaseXData/mets/atv.basex", O_RDWR|O_CREAT, 0666) = 127
38797 14:23:12 open("/srv/BaseXData/mets/upd.basex", O_WRONLY|O_CREAT|O_EXCL, 0666) = 128
38797 14:23:12 open("/srv/BaseXData/mets/pth.basex", O_RDONLY) = 128
38797 14:23:12 open("/srv/BaseXData/mets/tbli.basex", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 128
38797 14:23:12 open("/srv/BaseXData/mets/inf.basex", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 128