809272 – Service disabled due to bind failure

Bug 809272 - Service disabled due to bind failure

Summary: Service disabled due to bind failure

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	xinetd
Sub Component:
Version:	16
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Vojtech Vitek
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:	809271
Blocks:
TreeView+	depends on / blocked

Reported:	2012-04-02 22:57 UTC by Bryan Mason
Modified:	2015-03-04 23:57 UTC (History)
CC List:	3 users (show)
Fixed In Version:	xinetd-2.3.14-46.fc16
Clone Of:	809271
Environment:
Last Closed:	2012-04-27 20:49:04 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Proposed patch (1.98 KB, patch) 2012-04-02 23:36 UTC, Bryan Mason	no flags	Details \| Diff
Proposed fix to patch in bug 795188 (1.09 KB, patch) 2012-04-02 23:37 UTC, Bryan Mason	no flags	Details \| Diff
Test code (1.57 KB, text/plain) 2012-04-03 17:34 UTC, Bryan Mason	no flags	Details
View All

Description Bryan Mason 2012-04-02 22:57:18 UTC

+++ This bug was initially created as a clone of Bug #809271 +++

Description of problem:

    A service will be permanently disabled due to a bind failure that
    occurs when the service is restarted after being temporarily
    disabled because the service has hit it CPS limits.

    In other words:
    a) A service hits its CPS limit and is disabled.
    b) Xinetd waits the number of seconds specified in the service's
       configuration
    c) When xinetd attempts to restart the service, it is unable
       to bind to the appropriate port, and the service fails.

    What you see in /var/log/messages is similar to:

        FAIL: telnet per_source_limit from=::ffff:10.14.16.129
        Deactivating service telnet due to excessive incoming connections.  
            Restarting in 1 seconds.
        bind failed (Address already in use (errno = 98)). service = telnet
        Error activating service telnet

    This is caused because:
    a) The service's socket is not closed until the children have exec'ed
       their servers (the socket has FD_CLOEXEC set).
    b) The children forked while the service is still active have not
       had a chance to exec their servers before xinetd attempts to
       restart the service.

    When this occurs, one or more of the children are still have the
    service's socket open, and thus the call to bind fails.

Version-Release number of selected component (if applicable):

    xinetd-2.3.14-33.el6

How reproducible:

    100% when both the CPS value and wait time are relatively small.
    An entry of

        cps = 5 1

    is sufficient to trigger this problem in under 30 seconds in my
    testing.

Steps to Reproduce:

    1.  Set the CPS value in the xinetd service configuration file to
        to something like "cps = 5 1"
    2.  Run a program that quickly and consistently exceeds the CPS
        limit established in the previous step.
  
Actual results:

    The service will be permanently deactivated.

Expected results:

    The service should be activated.

Additional info:

    This appears to be a long-standing issue with xinetd, as shown by:

        http://justlinux.com/forum/showthread.php?t=42681

    which was posted in 2001.

Comment 1 Bryan Mason 2012-04-02 23:36:34 UTC

Created attachment 574680 [details]
Proposed patch

This patch adds some logic in cps_service_restart() to try and restart the
service if it fails.  It does this by using the svc_attempts member of the
service structure (also known as SVC_ATTEMPTS(sp)).  When a service is
successfully activated in svc_activate(), SVC_ATTEMPTS(sp) is reset to 0. 
Whenever svc_activate() fails, SVC_ATTEMPTS(sp) is incremented and if it is
less than (a newly #defined value) MAX_SVC_ATTEMPTS, then xtimer_add() is
called to create an event that calls cps_service_restart() in one second.  If
SVC_ATTEMPTS(sp) >= MAX_SVC_ATTEMPTS, then service activation fails as before.

I've set MAX_SVC_ATTEMPTS to 30, and the code attempts a restart every second. 
Although these seemed like reasonable values to me, they should be reviewed for
appropriateness.

This patch was developed for xinetd-2.3.14-33.el6, but I think it should apply relatively cleanly to Fedora as well.

Comment 2 Bryan Mason 2012-04-02 23:37:51 UTC

Created attachment 574681 [details]
Proposed fix to patch in bug 795188

My testing exposed what I believe to be a problem with the patch[1] that was
intended to fix Bug 795188.  

The patch[2] from Bug 702670 creates a new member in the service structure
svc_pfd_index, which is an index into ps.rws.pfd_array that corresponds to the
pfd pointed to SVC_POLLFD(sp).  The patch from Bug 795188 incorrectly (in my
opinion) decrements ps.rws.pdfs_last and sets sp->svc_pfd_index to that value
when svc_activate() fails.  If sp->svc_pfd_index has already been set to some
value before svc_activate() is called, like in the case where a previously
deactivated service is being reactivated by cps_service_restart(), then that
action seems inappropriate.

This patch is correct because it mimics the behavior of svc_deactivate().  I've
done some quick testing in RHEL 6.2 and RHEL 5.9 and it eliminated leaking file
descriptor issues in my test cases.

This patch was developed for xinetd-2.3.14-33.el6, but I think it will apply relatively cleanly to Fedora as well.

[1]http://lists.fedoraproject.org/pipermail/scm-commits/2012-March/745046.html
[2]http://lists.fedoraproject.org/pipermail/scm-commits/2012-January/720307.html

Comment 3 Jan Synacek 2012-04-03 06:37:54 UTC

Thank you, Bryan!

I will have a look at your patches and try to include them.

Comment 4 Jan Synacek 2012-04-03 08:12:11 UTC

Bryan, can you please provide some information on how you tested this? I can't reproduce it, even though I tried setting cps to 5 1 (even 1 5) and bombing the daemon with telnet and ftp requests.

Could you also test it with the latest fedora release?

Comment 5 Bryan Mason 2012-04-03 17:31:21 UTC

Hi Jan,

One of our partners created a test program that reproduces it quite nicely.  I've asked them if we can post the test code publicly to this case.  

I'll test with Fedora 16 as soon as I can.

Comment 7 Bryan Mason 2012-04-03 17:36:05 UTC

My xinetd configuration for testing looks like:

[bjmason@sf00580488-rhel6 ~]$ diff -u /etc/xinetd.d/telnet.o /etc/xinetd.d/telnet
--- /etc/xinetd.d/telnet.o	2012-04-02 15:24:05.000000000 -0700
+++ /etc/xinetd.d/telnet	2012-04-02 15:20:35.000000000 -0700
@@ -10,4 +10,6 @@
 	user		= root
 	server		= /usr/sbin/in.telnetd
 	log_on_failure	+= USERID
+	per_source	= 1
+	cps		= 5 1
 }

Comment 8 Bryan Mason 2012-04-03 17:44:31 UTC

Hi Jan,

I've tested with F16, and it fails there as well:

Apr  3 10:42:46 bjmason xinetd[13891]: bind failed (Address already in use (errno = 98)). service = telnet
Apr  3 10:42:46 bjmason xinetd[13891]: Error activating service telnet

It only failed, however, when I ran the test program from a second system.  If I tried to run the test program from the local host, it did not fail.

Comment 9 Bryan Mason 2012-04-09 17:32:02 UTC

To reproduce, compile the code in the attachment "Test code" and run (as root):

    ulimit -n 10000
    ./xinetd_err <host> <port>

All testing so far has been with telnet (port 23).

Comment 10 Jan Synacek 2012-04-10 10:08:38 UTC

Hello Bryan,

I managed to reproduce the issue. I will need some time to test the fix and due to more priority work right now, it may take a while.. Just wanted to keep you informed.

Comment 11 Jan Synacek 2012-04-13 10:04:57 UTC

Fixed in rawhide:
http://lists.fedoraproject.org/pipermail/scm-commits/2012-April/769238.html

Comment 12 Fedora Update System 2012-04-16 11:07:52 UTC

xinetd-2.3.14-46.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/xinetd-2.3.14-46.fc17

Comment 13 Fedora Update System 2012-04-16 11:24:32 UTC

xinetd-2.3.14-46.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/xinetd-2.3.14-46.fc16

Comment 14 Fedora Update System 2012-04-18 19:34:49 UTC

Package xinetd-2.3.14-46.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing xinetd-2.3.14-46.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-6047/xinetd-2.3.14-46.fc16
then log in and leave karma (feedback).

Comment 15 Fedora Update System 2012-04-22 04:21:56 UTC

xinetd-2.3.14-46.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 16 Fedora Update System 2012-04-27 20:49:04 UTC

xinetd-2.3.14-46.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.