I did some more work to capture the relevant information for the two crashes.
As you recall, I build a container image on top of the official (but old) basex one. It copies the database into the right place in the container.
I added the -c switch to the basexhttp command when the container starts. Even when I do this, the container does not have data context set when the first operation involving the database happens - If I do a query that involves the open database:
<query> <text>count(/ada)</text> </query>
I get:
Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 9.6 RC1 Java: IcedTea, 1.8.0_212 OS: Linux, amd64 Stack Trace: java.lang.NullPointerException at org.basex.data.Data.defaultNs(Data.java: 270) at org.basex.query.expr.path.NameTest.noMatches(NameTest.java: 60) at org.basex.query.expr.path.Step.optimize(Step.java: 162) at org.basex.query.expr.path.Step.optimize(Step.java: 134) at org.basex.query.expr.Preds.compile(Preds.java: 59) at org.basex.query.expr.path.Path.lambda$compile$0(Path.java: 139) at org.basex.query.CompileContext.get(CompileContext.java: 165) at org.basex.query.expr.path.Path.compile(Path.java: 134) at org.basex.query.expr.Arr.compile(Arr.java: 47) at org.basex.query.scope.MainModule.comp(MainModule.java: 81) at org.basex.query.QueryCompiler.compile(QueryCompiler.java: 119) at org.basex.query.QueryCompiler.compile(QueryCompiler.java: 106) at org.basex.query.QueryContext.compile(QueryContext.java: 306) at org.basex.query.QueryProcessor.compile(QueryProcessor.java: 79) at org.basex.core.cmd.AQuery.query(AQuery.java: 91) at org.basex.core.cmd.XQuery.run(XQuery.java: 22) at org.basex.core.Command.run(Command.java: 257) at org.basex.http.rest.RESTCmd.run(RESTCmd.java: 105) at org.basex.http.rest.RESTQuery.query(RESTQuery.java: 69) at org.basex.http.rest.RESTQuery.run0(RESTQuery.java: 37) at org.basex.http.rest.RESTCmd.run(RESTCmd.java: 70) at org.basex.core.Command.run(Command.java: 257) at org.basex.core.Command.execute(Command.java: 93) at org.basex.core.Command.execute(Command.java: 116) at org.basex.http.rest.RESTServlet.run(RESTServlet.java: 32) at org.basex.http.BaseXServlet.service(BaseXServlet.java: 65)
If I remove the -c flag, and just let the container start (still with the database copied into place in the container), I get this trace when I try to do anything related with the database:
Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 9.6 RC1 Java: IcedTea, 1.8.0_212 OS: Linux, amd64 Stack Trace: java.lang.NullPointerException at org.basex.data.DiskData.write(DiskData.java:146) at org.basex.data.DiskData.close(DiskData.java:160) at org.basex.core.Datas.unpin(Datas.java:52) at org.basex.core.cmd.Close.close(Close.java:45) at org.basex.query.QueryResources.close(QueryResources.java:92) at org.basex.query.QueryContext.close(QueryContext.java:515) at org.basex.query.QueryProcessor.close(QueryProcessor.java:251) at org.basex.core.cmd.AQuery.query(AQuery.java:132) at org.basex.core.cmd.XQuery.run(XQuery.java:22) at org.basex.core.Command.run(Command.java:257) at org.basex.core.Command.execute(Command.java:93) at org.basex.api.client.LocalSession.execute(LocalSession.java:132) at org.basex.api.client.Session.execute(Session.java:36) at org.basex.core.CLI.execute(CLI.java:92) at org.basex.core.CLI.execute(CLI.java:76) at org.basex.BaseX.console(BaseX.java:177) at org.basex.BaseX.<init>(BaseX.java:152) at org.basex.BaseX.main(BaseX.java:43)
I hope this is useful. Right now I am blocked.
Best Regards
Peter Villadsen.
From: Peter Villadsen Sent: Tuesday, March 4, 2025 12:43 PM To: Christian Grün christian.gruen@gmail.com Cc: basex-talk@mailman.uni-konstanz.de Subject: RE: [EXTERNAL] Re: [basex-talk] HTTP server performance seems very slow...
Christian,
Yes, I have. Thank you for following up - I should have come back earlier.
I have been experimenting with this for a while now, and the container image (the official one and the quodatum, newer 10.3 one) both crash when I try to use them, both through HTTP and TCP. I am still looking into it. If I do not manage to find out what the issue is, I will upload the stack traces.
In both cases, I built my own container to include the database, so I can avoid the volumes and have the container be completely self-contained. The database is 19GB, so the container gets pretty big.
Here is how I start the container:
docker run -d -e BASEX_JVM=-Xmx19G -p 8080:8080 -p 1984:1984 -p 8984:8984 rainier05042023
In my humble opinion it is unfortunate that the official container image has not been updated for at least 3 years. It would be nice to have the newest bits there, supported by BaseX.
Here is the dockerfile I use:
# escape=` # Use the BaseX 10.3 image as the base image FROM basex/basexhttp
# Copy the Windows database directory into the container so it is available # when the container starts, without providing a --volume parameter. # This is fine since the database is essentially read-only.
WORKDIR /srv/basex/data COPY --chown=basex:basex Rainier05042023 "Rainier05042023/"
# The older versions of BaseX just use admin/admin. # RUN echo "admin" | /srv/basex/bin/basex -cPASSWORD
# Modify the CMD command so that the Rainier05042023 database is opened CMD /usr/local/bin/basexhttp -c "open Rainier05042023"
LABEL description="Legacy BaseX with Rainier05042023 database"
# Here is a build command that builds the container with the name Rainier05042023: # # cd to the directory containing this Dockerfile and run the command: # docker build -t rainier05042023 . # # When the docker container has been built it can be run with the name # provided in the build command i.e. rainier05042023. It can be saved # to a file with the command: # # docker save -o Rainier05042023.tar rainier05042023 # # and loaded with the command: # # docker load -i rainier05042023.tar # # The container can be run with the command: # docker run -d -e BASEX_JVM=-Xmx19G -p 8080:8080 -p 1984:1984 -p 8984:8984 rainier05042023 # The database can be accessed at http://localhost:8080/dba/
Best Regards
Peter VIlladsen
From: Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> Sent: Tuesday, March 4, 2025 6:05 AM To: Peter Villadsen <Peter.Villadsen@microsoft.commailto:Peter.Villadsen@microsoft.com> Cc: basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de Subject: [EXTERNAL] Re: [basex-talk] HTTP server performance seems very slow...
Hi Peter,
To be sure, could you confirm that you have received my mails?
Best regards, Christian
On Sat, Feb 22, 2025 at 12:41 PM Christian Grün <christian.gruen@gmail.commailto:christian.gruen@gmail.com> wrote: Hi Peter,
This leads me to believe that a lot of the time (>7 seconds) may be spent opening the database each time a POST is done? Is there a way to tweak the HTTP server to “remember” the connection with the current database for a little while? This may be against the REST principles, of course. The database is guaranteed to be read-only in my case.
One option is to open the database with the initial basexhttp call. It will be kept open until the server is shut down:
basexhttp -c"open name-of-db"
Best, Christian
On Sat, Feb 15, 2025 at 8:53 PM Peter Villadsen via BaseX-Talk <basex-talk@mailman.uni-konstanz.demailto:basex-talk@mailman.uni-konstanz.de> wrote: All,
I have been using BaseX for a while, connecting to the TCP endpoint. I know the performance I typically get, and it is impressive! However, now I wanted to use the HTTP endpoint, and it seems the performance is at least 2 orders of magnitude worse!
Here is the query that I am POSTing to http://localhost:8984/rest/RainFnd_6.0.10.0
<query xmlns=http://basex.org/rest> <text>/Class[@Package='ApplicationPlatform']/@Name</text> </query>
This simple query will generate around 1500 results from the 13GB database (RainFnd_6.0.10.0http://localhost:8984/rest/RainFnd_6.0.10.0). It takes just over 7 seconds to do this. If I do this in the BaseX GUI that is self contained, it takes around 20ms.
However, it seems that the time spent executing the query against the database is negligible. Please consider this query:
<query xmlns=http://basex.org/rest> <text>1 + 2</text> </query>
In which there there is obviously no database access. It takes almost the same amount of time as the query that accesses the database. 7 seconds to calculate 1 + 2 is too long.
If I post the 1 + 2 query to the endpoint without specifying the database on the URL:
it takes around 7 milliseconds, close to what I expected, certainly within expectations for the time spent sending the query over the wire and serializing etc.
This leads me to believe that a lot of the time (>7 seconds) may be spent opening the database each time a POST is done? Is there a way to tweak the HTTP server to “remember” the connection with the current database for a little while? This may be against the REST principles, of course. The database is guaranteed to be read-only in my case.
The problem is that this makes the HTTP server inappropriate for interactive applications. I can still use the TCP server, where I get the results I need, but using the HTTP would be simpler, and have less overhead in terms of code needed to communicate with the server.
Please let me know if there is a way to accomplish acceptable performance with the HTTP server.
Best Regards
Peter Villadsen Principal Technical Program Manager Microsoft Business Applications Group