Description of problem:
Resolving via NIS uses a disproportionate number of TCP connects from
ports in the reserved range (below 1024) when root. This can break server
(NFS) and any other machine in several ways (out of privileged ports,
conflicting ports in use). In some cases the issue can be mitigated
though the use of nscd.
When exporting a number (20) of mounts over NFS (version 3, UDP only)
to a number (60) of clients via a NIS netgroup the server start up
broke because there were too many lookups involved consuming way to many
Aug 8 16:46:13 nfs5 kernel: svc: failed to register lockdv1 RPC service (errno 5).
Aug 8 16:46:13 nfs5 kernel: lockd_up: makesock failed, error=-5
Aug 8 18:28:53 nfs5 kernel: svc: failed to register nfsdv2 RPC service (errno 5).
Aug 8 18:28:53 nfs5 kernel: svc: failed to register nfsaclv2 RPC service (errno 5).
Aug 8 18:28:53 nfs5 kernel: nfsd: last server has exited, flushing export cache
Aug 8 18:28:53 nfs5 xinetd: bind failed (Address already in use (errno = 98)). service = login
Aug 8 18:28:53 nfs5 xinetd: bind failed (Address already in use (errno = 98)). service = shell
Aug 8 18:28:53 nfs5 xinetd: bind failed (Address already in use (errno = 98)). service = rsync
This is all due to excessive numbers of privileged TCP ports part of
connections in TIME_WAIT state. This issue can be mitigated by an early
echo 1 >/proc/sys/net/ipv4/tcp_fin_timeout
during startup (maybe '0' works even better, not sure).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Install and configure ypbind, do not run nscd. Match "netstat -ant"
output against "rpcinfo -p" output after (do this as "root"):
- ls -l causing many user/group lookups
- service nfslock start
- service nfs start
- rsh as non-root
Rsh is an interesting case since it requires a privileged port but uses
many of them too for NIS lookups for no reason: it should use unprivileged
ports for that.
Lots of TCP connections in TIME_WAIT from privileged TCP ports to ypbind
TCP port registered at rpcbind.
This could be considered an RPC (glibc/sunrpc) issue. Maybe it will
resurface in libtirpc (has a /etc/netconfig however).
Things to consider:
- ypbind UDP-only option.
- local RPC should always use UDP.
- RPC by root: Don't use privileged ports unless explicitly asked for.
- separate TCP FIN timeout (=zero) for local TCP connects.
- An UDP-only ypbind is not an option (too much dependencies on TCP already)
- the tcp_fin_timeout trick is not very effective and the wrong approach anyway
- It is a F14 glibc/sunrpc scalability issue: privileged (TCP) ports are
a precious resource, especially when they linger in TIME_WAIT for 60s.
- There is also an exportfs (nfs-utils) scalability issue w.r.t. netgroups.
Truncating /var/lib/nfs/rmtab before starting nfs is a server workaround.
Created attachment 527403 [details]
proposed patch to use reserved port only for secure maps
There is a similar request in RHEL-6 (bug #689424). A proposed solution there is based on HP solution -- they use reserved ports only for secure maps (see http://bizsupport1.austin.hp.com/bc/docs/support/SupportManual/c02037757/c02037757.pdf).
This patch, which tries to use reserved port only when asking passwd maps, can be used as a proof of concept. We'll probably need to define which maps are secure on the client side in the same way as it is done on the server side (it is defined in /etc/ypserv.conf).
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
We have a similar problem open as support case 00546448. We have noticed that RHEL 5 looks in /var/yp/binding/ to find a NIS master and that RHEL6 does not, just strace'd ls. We've looked at some library sources and the routine that reads the file, yp_bind_file in nis/ypclnt.c is conditionally compiled in. We can use "nm" to find that routine in /lib/libnsl-2.5.so on RHEL5, we can't find it in any file in /lib64 on RHEL6. This change causes many extra access to ypbind, while
this may not be the whole problem we'd like to eliminate this issue, it is at least a clear difference between RHEL5 which does not have the problem for us,
*** This bug has been marked as a duplicate of bug 689424 ***