Hallo Andreas,

thank you for your hint to start basex with the -z option.

But I'm sorry to say that it did not change the basex servers reactions.

1.    lsof does report the following lines:


COMMAND  PID      USER    FD   TYPE DEVICE    SIZE/OFF  NODE      NAME
java                23767 userXY   30u  sock    0,6              0t0              92598      can't identify protocol


u for a read and write lock of any length
''sock'' for a socket of unknown domain

see the lsof man page (1).

It shall be pointed out that TYPE has the value sock in contrast to REG which stands for regular file, see also the attached image.

2.    Therefore it is very likely that not a regular file remains open, but a socket.

3.    The tcpdump - viewed with wireshark - shows that the healthy check of the load balancer - mon and ldirectoryd do the same - sends a RST (Connection reset) message TCP to the BaseX database.

Each RST message from the mon or ldirectord seem to lead to a new socket that remains open ...

One question is:

What can be done one the BaxeX-side to make it compatible with a load balancer?
(Especially with ldirectord, that is used in the productive environment for healthy check.)

With best regards

Andreas

(1)    http://linux.die.net/man/8/lsof

Hello Andreas,

the only file that i can think of that is opened for each client is the log file of BaseX. Normally the log file should just be opened by the server and the clients are referring to it, however, you should give it a try and start the basexserver with the "-z" option (http://docs.basex.org/wiki/Startup_Options) for suppressing the logging mechanism.

-- Andreas

Am 12.05.2012 um 00:41 schrieb Andreas Rulle:

Hello Andreas,

this email informs you that

*    with the tool mon (1),
*    that is often used together, see (3), the LVS load balancer (2)

it has been able to reproduce the

   java.net.SocketException: Too many open files

with the following configuration

   watch  basex
           service ping
               description Responses to ping
               interval 2s
               monitor tcp.monitor -p 1984
               period

It seems to be that basex does not close the sockets when monitored with tcp.monitor -p 1984.
The basex-server crashes when ulimit -n is reached.

With best regards,

Andreas

(1)     https://mon.wiki.kernel.org/index.php/Monitors

(2)     http://www.linuxvirtualserver.org/

(3)     http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.realserver_failure.html



Am 11.05.2012 18:21, schrieb Andreas Rulle:
Hello Andreas,

thank you very much for your valuable hints.

   sudo lsof -p basex-ps-no

delivers many lines of

    java    11489 root  962u  sock      0,6      0t0 10449645 can't identify protocol

and

  (1)     sudo lsof -p 11489 | wc -l gets

reports an increasing number of used "files". The increase of the number in (1) in a given time interval
correlates to the number of requests that the LVM load balancer sends to the basex port 1984
in that interval. There are about 4 requests from the LVM load balancer per minute.

At the time of this writing (1) has the value of 1076 ...

The parameter keepalive has the value

KEEPALIVE = 600

but it does not seem to stop the increase of the value in (1).

This information opens the way to workarounds

- decrease the number of requests from the LVM load balancer,
- increase the ulimit -n
- restart the basex -server before the number in one reaches ulimt -n.

But we really would prefer solutions to the increase of the figure in (1)

The almost identical settings of the load balancer do work for a MySQL-Database without any problems.
And without the load balancer an instance of the basex runs since April 10 without hitting ulimit -n = 1024.


Any hints on this are very welcome!

With kind regards,

Andreas

I just searched our mailing list, cause there was something about too many open files some time ago:

On typical Linux installations, die open filedescriptor limit is 1024:

$ grep 'open files' /proc/self/limits
Max open files            1024                 1024                 files

In Java, if a file-based object (FileWriter, FileReader, etc.) is not closed, the underlying file descriptor is not closed. Bug detectors (FindBugs) check for that intraprocedurally. I recommend you run both BaseX as well as this application through it.

If an open file-based object becomes unreachable, the finalizer will eventually close it - but it's possible to run out of open file descriptors simply due to unreachable, but not yet finalized objects. (Of course, if the object is leaked, it won't be closed ever.)

In Linux, use 'ps' to find out the pid of the Java JVM process, then do a ls -l /proc/<pid>/fds to see which file descriptors the process in question has open; or use the 'lsof' command.

 - Godmar
Probably that helps to investigate the issue.






--
Nexoma GmbH
Theodorus Weg 7
59755 Arnsberg

Tel.    + 49 (0) 52 51 1613-0
Fax     + 49 (0) 52 51 1613-99

mailto:andreas.rulle@nexoma.de

Geschäftsführer: Guido Sauerland
Sitz der Gesellschaft: Arnsberg
Registergericht: Arnsberg, HRB 9365

_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk


-- 
Nexoma GmbH
Theodorus Weg 7
59755 Arnsberg

Tel.    + 49 (0) 52 51 1613-0
aktuell + 49 (0) 29 32 99 400 52
Fax     + 49 (0) 52 51 1613-99

mailto:andreas.rulle@nexoma.de

Geschäftsführer: Guido Sauerland
Sitz der Gesellschaft: Arnsberg
Registergericht: Arnsberg, HRB 9365