| Summary: | Service disabled due to bind failure | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Bryan Mason <bmason> | ||||||||
| Component: | xinetd | Assignee: | Vojtech Vitek <vvitek> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 16 | CC: | hripps, jsynacek, vvitek | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | xinetd-2.3.14-46.fc16 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | 809271 | Environment: | |||||||||
| Last Closed: | 2012-04-27 20:49:04 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Bug Depends On: | 809271 | ||||||||||
| Bug Blocks: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Bryan Mason
2012-04-02 22:57:18 UTC
Created attachment 574680 [details]
Proposed patch
This patch adds some logic in cps_service_restart() to try and restart the
service if it fails. It does this by using the svc_attempts member of the
service structure (also known as SVC_ATTEMPTS(sp)). When a service is
successfully activated in svc_activate(), SVC_ATTEMPTS(sp) is reset to 0.
Whenever svc_activate() fails, SVC_ATTEMPTS(sp) is incremented and if it is
less than (a newly #defined value) MAX_SVC_ATTEMPTS, then xtimer_add() is
called to create an event that calls cps_service_restart() in one second. If
SVC_ATTEMPTS(sp) >= MAX_SVC_ATTEMPTS, then service activation fails as before.
I've set MAX_SVC_ATTEMPTS to 30, and the code attempts a restart every second.
Although these seemed like reasonable values to me, they should be reviewed for
appropriateness.
This patch was developed for xinetd-2.3.14-33.el6, but I think it should apply relatively cleanly to Fedora as well.
Created attachment 574681 [details] Proposed fix to patch in bug 795188 My testing exposed what I believe to be a problem with the patch[1] that was intended to fix Bug 795188. The patch[2] from Bug 702670 creates a new member in the service structure svc_pfd_index, which is an index into ps.rws.pfd_array that corresponds to the pfd pointed to SVC_POLLFD(sp). The patch from Bug 795188 incorrectly (in my opinion) decrements ps.rws.pdfs_last and sets sp->svc_pfd_index to that value when svc_activate() fails. If sp->svc_pfd_index has already been set to some value before svc_activate() is called, like in the case where a previously deactivated service is being reactivated by cps_service_restart(), then that action seems inappropriate. This patch is correct because it mimics the behavior of svc_deactivate(). I've done some quick testing in RHEL 6.2 and RHEL 5.9 and it eliminated leaking file descriptor issues in my test cases. This patch was developed for xinetd-2.3.14-33.el6, but I think it will apply relatively cleanly to Fedora as well. [1]http://lists.fedoraproject.org/pipermail/scm-commits/2012-March/745046.html [2]http://lists.fedoraproject.org/pipermail/scm-commits/2012-January/720307.html Thank you, Bryan! I will have a look at your patches and try to include them. Bryan, can you please provide some information on how you tested this? I can't reproduce it, even though I tried setting cps to 5 1 (even 1 5) and bombing the daemon with telnet and ftp requests. Could you also test it with the latest fedora release? Hi Jan, One of our partners created a test program that reproduces it quite nicely. I've asked them if we can post the test code publicly to this case. I'll test with Fedora 16 as soon as I can. My xinetd configuration for testing looks like: [bjmason@sf00580488-rhel6 ~]$ diff -u /etc/xinetd.d/telnet.o /etc/xinetd.d/telnet --- /etc/xinetd.d/telnet.o 2012-04-02 15:24:05.000000000 -0700 +++ /etc/xinetd.d/telnet 2012-04-02 15:20:35.000000000 -0700 @@ -10,4 +10,6 @@ user = root server = /usr/sbin/in.telnetd log_on_failure += USERID + per_source = 1 + cps = 5 1 } Hi Jan, I've tested with F16, and it fails there as well: Apr 3 10:42:46 bjmason xinetd[13891]: bind failed (Address already in use (errno = 98)). service = telnet Apr 3 10:42:46 bjmason xinetd[13891]: Error activating service telnet It only failed, however, when I ran the test program from a second system. If I tried to run the test program from the local host, it did not fail. To reproduce, compile the code in the attachment "Test code" and run (as root):
ulimit -n 10000
./xinetd_err <host> <port>
All testing so far has been with telnet (port 23).
Hello Bryan, I managed to reproduce the issue. I will need some time to test the fix and due to more priority work right now, it may take a while.. Just wanted to keep you informed. xinetd-2.3.14-46.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/xinetd-2.3.14-46.fc17 xinetd-2.3.14-46.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/xinetd-2.3.14-46.fc16 Package xinetd-2.3.14-46.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing xinetd-2.3.14-46.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-6047/xinetd-2.3.14-46.fc16 then log in and leave karma (feedback). xinetd-2.3.14-46.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. xinetd-2.3.14-46.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report. |