+++ This bug was initially created as a clone of Bug #141773 +++ The following has be reported by IBM LTC: Hardware Environment: n/a (found on several architectures) Firmware Environment: n/a Software Environment: RHEL 4, beta 2 Steps to Reproduce: 1. Create some mount points 2. mount them 3. unmount them 4. mount them again Actual Results: This error message gets reported: nfs bindresvport: Address already in use Expected Results: No error message Additional Information: I found this bug in the redhat bugzilla db that looks like the exact thing we are hitting. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=128966 At the bottom of the bug, it mentions that this will be fixed in RHEL 3, U5.. I'd like to know if it is possible for Red Hat to create a kernel with this patch to see if it will resolve our problem? And so we could have a fix for RHEL 4?
Pasting in comment from bz #146629: From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041020 Description of problem: On a dual-Xeon 4GB server running FC3 I am unable to successfully issue more than 52 automount requests to unique Linux (also FC3) NFS servers in less than a minute. After 52 mounts I get the following error. This is reproducible. Jan 30 10:40:22 beowulf automount[22882]: >> nfs bindresvport: Address already in use Jan 30 10:40:22 beowulf automount[22882]: mount(nfs): nfs: mount failure node54:/usr1 on /data/node54 Jan 30 10:40:22 beowulf automount[22882]: failed to mount /data/node54 After waiting ~1 minute I am able to mount another 52 filesystems before the getting the same error again. This problem does not occur on a RedHat 9 client machine nor on a Solaris 9 client. This limitation is breaking an application on a large Beowulf cluster that cross-mounts data between 290 GigE attached nodes. I have also reproduced this problem with a script that explicitly calls /bin/mount rather than relying on automount. Version-Release number of selected component (if applicable): autofs-4.1.3-28 How reproducible: Always Steps to Reproduce: 1. setup /etc/auto.data to have /data/node* point to 290 Linux NFS servers. 2. umount /data/node* 3. ls -l /data/node*/known_file Actual Results: After 52 successfully mounts and ls results, the following shows up in /var/log/messages, Jan 30 19:42:01 beowulf automount[23509]: >> nfs bindresvport: Address already in use Jan 30 19:42:01 beowulf automount[23509]: mount(nfs): nfs: mount failure node53:/usr1 on /data/node53 Jan 30 19:42:01 beowulf automount[23509]: failed to mount /data/node53 ... (for the remaining nodes) Expected Results: The ls output for 290 files. Additional info:
Another cut-n-paste from 146629: Comment #17 From Stuart Anderson (anderson.edu) on 2005-04-12 15:39 EST I upgraded to util-linux-2.12a-23 on the client side only (as was the case for autofs, if I need to update anything on the NFS servers please let me know), and I now typically get 205-260 mounts before the bindresvport error. Note, in my case, each mount request is to a distince server.
And more cut-n-paste from 146629: Comment #24 From Stuart Anderson (anderson.edu) on 2005-04-20 12:23 EST [reply] Private Many thanks for your help. I have also been tracking the corresponding static mount problem, https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=141773 and I just reproduced it again today. In particular, after adding the 290 mount points to /etc/fstab and then running grep | xargs mount I am able to get 237 successful mounts before finding, nfs bindresvport: Address already in use nfs bindresvport: Address already in use nfs bindresvport: Address already in use ... for the remainder of the 290 requests. $ rpm -qf /bin/mount util-linux-2.12a-23
Could you please post the output of "netstat -a | grep ^tcp". I think there is a reserver port leak in the pmap_getport() routine which cause things like NIS to unnecessarily use reserver port to talk to the portmapper.
Where exactly do you think there is a leak? The pmap_getport code is pretty straight-forward, and I can't find any leaks in it. Do you have some empirical evidence (i.e. a unit test) to prove this? Now, pmap_getport definitely calls clnttcp_create with a socket of RPC_ANYSOCK, which causes a new socket to be created and bound to a reserved port. The socket is closed before returning from pmap_getport, though, by way of calling CLNT_DESTROY.
Created attachment 113579 [details] output of netstat -a | grep ^tcp after failing to mount I have removed the unimportant entries associated with ssh connections and a few other non-NFS related services.
Well as the netstat trace clearly shows (in Comment #6) about 40% of the tcp connections that are in TIME_WAIT are from portmap requests. Since a reserver port is *not* needed to make portmap requests, those ports are definitely a waste with respect to reserver port space.
Any ideas on how to get rid of the reserved port TIME_WAIT portmap requests? Or for that matter is there a reason any portmap connections should be left in TIME_WAIT?
*** This bug has been marked as a duplicate of 141773 ***