+++ This bug was initially created as a clone of Bug #809271 +++ Description of problem: A service will be permanently disabled due to a bind failure that occurs when the service is restarted after being temporarily disabled because the service has hit it CPS limits. In other words: a) A service hits its CPS limit and is disabled. b) Xinetd waits the number of seconds specified in the service's configuration c) When xinetd attempts to restart the service, it is unable to bind to the appropriate port, and the service fails. What you see in /var/log/messages is similar to: FAIL: telnet per_source_limit from=::ffff:10.14.16.129 Deactivating service telnet due to excessive incoming connections. Restarting in 1 seconds. bind failed (Address already in use (errno = 98)). service = telnet Error activating service telnet This is caused because: a) The service's socket is not closed until the children have exec'ed their servers (the socket has FD_CLOEXEC set). b) The children forked while the service is still active have not had a chance to exec their servers before xinetd attempts to restart the service. When this occurs, one or more of the children are still have the service's socket open, and thus the call to bind fails. Version-Release number of selected component (if applicable): xinetd-2.3.14-33.el6 How reproducible: 100% when both the CPS value and wait time are relatively small. An entry of cps = 5 1 is sufficient to trigger this problem in under 30 seconds in my testing. Steps to Reproduce: 1. Set the CPS value in the xinetd service configuration file to to something like "cps = 5 1" 2. Run a program that quickly and consistently exceeds the CPS limit established in the previous step. Actual results: The service will be permanently deactivated. Expected results: The service should be activated. Additional info: This appears to be a long-standing issue with xinetd, as shown by: http://justlinux.com/forum/showthread.php?t=42681 which was posted in 2001.
Created attachment 574680 [details] Proposed patch This patch adds some logic in cps_service_restart() to try and restart the service if it fails. It does this by using the svc_attempts member of the service structure (also known as SVC_ATTEMPTS(sp)). When a service is successfully activated in svc_activate(), SVC_ATTEMPTS(sp) is reset to 0. Whenever svc_activate() fails, SVC_ATTEMPTS(sp) is incremented and if it is less than (a newly #defined value) MAX_SVC_ATTEMPTS, then xtimer_add() is called to create an event that calls cps_service_restart() in one second. If SVC_ATTEMPTS(sp) >= MAX_SVC_ATTEMPTS, then service activation fails as before. I've set MAX_SVC_ATTEMPTS to 30, and the code attempts a restart every second. Although these seemed like reasonable values to me, they should be reviewed for appropriateness. This patch was developed for xinetd-2.3.14-33.el6, but I think it should apply relatively cleanly to Fedora as well.
Created attachment 574681 [details] Proposed fix to patch in bug 795188 My testing exposed what I believe to be a problem with the patch[1] that was intended to fix Bug 795188. The patch[2] from Bug 702670 creates a new member in the service structure svc_pfd_index, which is an index into ps.rws.pfd_array that corresponds to the pfd pointed to SVC_POLLFD(sp). The patch from Bug 795188 incorrectly (in my opinion) decrements ps.rws.pdfs_last and sets sp->svc_pfd_index to that value when svc_activate() fails. If sp->svc_pfd_index has already been set to some value before svc_activate() is called, like in the case where a previously deactivated service is being reactivated by cps_service_restart(), then that action seems inappropriate. This patch is correct because it mimics the behavior of svc_deactivate(). I've done some quick testing in RHEL 6.2 and RHEL 5.9 and it eliminated leaking file descriptor issues in my test cases. This patch was developed for xinetd-2.3.14-33.el6, but I think it will apply relatively cleanly to Fedora as well. [1]http://lists.fedoraproject.org/pipermail/scm-commits/2012-March/745046.html [2]http://lists.fedoraproject.org/pipermail/scm-commits/2012-January/720307.html
Thank you, Bryan! I will have a look at your patches and try to include them.
Bryan, can you please provide some information on how you tested this? I can't reproduce it, even though I tried setting cps to 5 1 (even 1 5) and bombing the daemon with telnet and ftp requests. Could you also test it with the latest fedora release?
Hi Jan, One of our partners created a test program that reproduces it quite nicely. I've asked them if we can post the test code publicly to this case. I'll test with Fedora 16 as soon as I can.
My xinetd configuration for testing looks like: [bjmason@sf00580488-rhel6 ~]$ diff -u /etc/xinetd.d/telnet.o /etc/xinetd.d/telnet --- /etc/xinetd.d/telnet.o 2012-04-02 15:24:05.000000000 -0700 +++ /etc/xinetd.d/telnet 2012-04-02 15:20:35.000000000 -0700 @@ -10,4 +10,6 @@ user = root server = /usr/sbin/in.telnetd log_on_failure += USERID + per_source = 1 + cps = 5 1 }
Hi Jan, I've tested with F16, and it fails there as well: Apr 3 10:42:46 bjmason xinetd[13891]: bind failed (Address already in use (errno = 98)). service = telnet Apr 3 10:42:46 bjmason xinetd[13891]: Error activating service telnet It only failed, however, when I ran the test program from a second system. If I tried to run the test program from the local host, it did not fail.
To reproduce, compile the code in the attachment "Test code" and run (as root): ulimit -n 10000 ./xinetd_err <host> <port> All testing so far has been with telnet (port 23).
Hello Bryan, I managed to reproduce the issue. I will need some time to test the fix and due to more priority work right now, it may take a while.. Just wanted to keep you informed.
Fixed in rawhide: http://lists.fedoraproject.org/pipermail/scm-commits/2012-April/769238.html
xinetd-2.3.14-46.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/xinetd-2.3.14-46.fc17
xinetd-2.3.14-46.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/xinetd-2.3.14-46.fc16
Package xinetd-2.3.14-46.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing xinetd-2.3.14-46.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-6047/xinetd-2.3.14-46.fc16 then log in and leave karma (feedback).
xinetd-2.3.14-46.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
xinetd-2.3.14-46.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.