Bug 798159 - Binding files gets removed leading to too many sockets in "time wait" state
Summary: Binding files gets removed leading to too many sockets in "time wait" state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: ypbind
Version: 16
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Honza Horak
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-02-28 07:59 UTC by Magnus E
Modified: 2012-04-13 21:29 UTC (History)
2 users (show)

Fixed In Version: ypbind-1.35-1.fc16
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-12 02:45:38 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
proposed patch (1.88 KB, patch)
2012-03-16 15:14 UTC, Honza Horak
no flags Details | Diff

Description Magnus E 2012-02-28 07:59:19 UTC
Description of problem:
ypbind normally ping the NIS-server to see that it is alive.
If the server do not respond a search is done for other servers.
If no other server is found the binding files (/var/yp/binding/*) gets removed.

(This behaviour was fixed between versions 1.32 and 1.33)

However, the binding files do never get recreated again even if the NIS-server is back up. NIS contiues to work, but the clients then have to find the IP-address of the NIS-server for every call.

On a fast machine this leads to a plethora of TCP-sockets in the TIME-WAIT state, sometimes even so many that the system cannot create more. This leads to intermittent NIS-problems until the client clears away timed-out sockets.

To see what happens when the error-state has been entered, simply remove the binding-files manually while ypbind is running. You can then see the number of sockets in TIME-WAIT state increase by using for example the "ss -s" command.


Version-Release number of selected component (if applicable):
ypbind 1.33 FC16

How reproducible:
100%

Steps to Reproduce:
See: Description of Problem above
  
Actual results:
ypbind uses up all local sockets on client -> no more NIS until sockets gets cleaned up.


Expected results:
ypbind should recreate the binding files once the NIS server is responsive again


Additional info:

Comment 1 Honza Horak 2012-03-16 15:08:06 UTC
I was wondering how it is possible to get to a state where binding files have been deleted without deleting them manually, since it didn't seem to be reproducible using a simple configuration. If ypserv service is stopped, binding files won't be erased, which should be probably fixed by the way. 

Looking at the current code, the reported problem (erased binding files) can occur if a server answers to ping for rpcbind's info, finds out that ypserv is correctly bound, but the ypserv service itself doesn't respond. It seems like a race condition or an issue triggered by some unusual configuration, nevertheless it can happen.

These are steps how this can be consistently reproduced:
1) configure ypserv (alice) and ypbind (bob)
2) turn on iptables on alice and open ports 111 for rpcbind and relevant ports for ypserv (see rpcbind -p localhost on alice)
3) start ypbind on bob - yptest should work
4) see /var/yp/bindings/domainname.* files are present on bob
5) close ypserv ports on alice
6) sleep 25s on bob (let ypbind check for active servers)
7) see yptest doesn't work on bob and /var/yp/bindings/domainname.* are missing
8) open ypserv ports on alice again
9) sleep 25s on bob (to let ypbind check for active servers)
10) yptest works on bob, but /var/yp/bindings/domainname.* are still missing

Comment 2 Honza Horak 2012-03-16 15:14:44 UTC
Created attachment 570633 [details]
proposed patch

Binding files are not created if the current active server is the same as the last active server (comparing hostname and port for a domain). So, if binding files have been erased anyhow and a new server has the same attributes like the last active one, no binding files are re-created.

These are changes we need to do:
1) Binding files has to be erased also when no servers answer to pinging, which is not being done currently.
2) If binding files are removed, we should also clear information about the last active server.

Comment 3 Magnus E 2012-03-20 08:58:09 UTC
The patch supplied in comment #2 solves the issue!

The binding files now gets recreated instead of being gone forever.

Comment 4 Fedora Update System 2012-03-26 13:04:55 UTC
ypbind-1.35-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/ypbind-1.35-1.fc17

Comment 5 Fedora Update System 2012-03-26 13:08:27 UTC
ypbind-1.35-1.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/ypbind-1.35-1.fc16

Comment 6 Fedora Update System 2012-03-28 05:51:41 UTC
Package ypbind-1.35-1.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing ypbind-1.35-1.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-4730/ypbind-1.35-1.fc17
then log in and leave karma (feedback).

Comment 7 Magnus E 2012-04-02 08:06:44 UTC
I don't have a user name to log in with, but you have my karma this way instead.

Comment 8 Fedora Update System 2012-04-12 02:45:38 UTC
ypbind-1.35-1.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 9 Fedora Update System 2012-04-13 21:29:26 UTC
ypbind-1.35-1.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.