155470 – LTC12834-"nfs bindresvport: Address already in use" messages for mounting

Bug 155470 - LTC12834-"nfs bindresvport: Address already in use" messages for mounting

Summary: LTC12834-"nfs bindresvport: Address already in use" messages for mounting

Keywords:
Status:	CLOSED DUPLICATE of bug 141773
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	nfs-utils
Sub Component:
Version:	3
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Steve Dickson
QA Contact:	Ben Levenson
Docs Contact:
URL:
Whiteboard:
Depends On:	141773
Blocks:
TreeView+	depends on / blocked

Reported:	2005-04-20 17:31 UTC by Jeff Moyer
Modified:	2007-11-30 22:11 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-05-18 18:19:07 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
output of netstat -a \| grep ^tcp after failing to mount (147.67 KB, text/plain) 2005-04-23 00:37 UTC, Stuart Anderson	no flags	Details
View All

Description Jeff Moyer 2005-04-20 17:31:21 UTC

+++ This bug was initially created as a clone of Bug #141773 +++

The following has be reported by IBM LTC:  


Hardware Environment: n/a  (found on several architectures)

Firmware Environment: n/a

Software Environment: RHEL 4, beta 2

Steps to Reproduce:
1. Create some mount points
2. mount them
3. unmount them
4. mount them again

Actual Results:
This error message gets reported:
nfs bindresvport: Address already in use

Expected Results:
No error message

Additional Information:
I found this bug in the redhat bugzilla db that looks like the exact
thing we are hitting.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=128966

At the bottom of the bug, it mentions that this will be fixed in RHEL
3, U5..
I'd like to know if it is possible for Red Hat to create a kernel with
this patch to see if it will resolve our problem?  And so we could
have a fix for RHEL 4?

Comment 1 Jeff Moyer 2005-04-20 17:34:00 UTC

Pasting in comment from bz #146629:


From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3)
Gecko/20041020

Description of problem:
On a dual-Xeon 4GB server running FC3 I am unable to successfully
issue more than 52 automount requests to unique Linux (also FC3)
NFS servers in less than a minute.

After 52 mounts I get the following error. This is reproducible.

Jan 30 10:40:22 beowulf automount[22882]: >> nfs bindresvport: Address
already in use
Jan 30 10:40:22 beowulf automount[22882]: mount(nfs): nfs: mount
failure node54:/usr1 on /data/node54
Jan 30 10:40:22 beowulf automount[22882]: failed to mount /data/node54

After waiting ~1 minute I am able to mount another 52 filesystems
before the getting the same error again.

This problem does not occur on a RedHat 9 client machine nor on
a Solaris 9 client.

This limitation is breaking an application on a large Beowulf cluster
that cross-mounts data between 290 GigE attached nodes. I have also
reproduced this problem with a script that explicitly calls
/bin/mount rather than relying on automount.

Version-Release number of selected component (if applicable):
autofs-4.1.3-28

How reproducible:
Always

Steps to Reproduce:
1. setup /etc/auto.data to have /data/node* point to 290 Linux NFS
servers.
2. umount /data/node*
3. ls -l /data/node*/known_file
    

Actual Results:  After 52 successfully mounts and ls results, the
following shows
up in /var/log/messages,

Jan 30 19:42:01 beowulf automount[23509]: >> nfs bindresvport: Address
already in use
Jan 30 19:42:01 beowulf automount[23509]: mount(nfs): nfs: mount
failure node53:/usr1 on /data/node53
Jan 30 19:42:01 beowulf automount[23509]: failed to mount /data/node53
... (for the remaining nodes)

Expected Results:  The ls output for 290 files.

Additional info:

Comment 2 Jeff Moyer 2005-04-20 17:35:23 UTC

Another cut-n-paste from 146629:

Comment #17 From Stuart Anderson (anderson.edu)  	 on 2005-04-12
15:39 EST

I upgraded to util-linux-2.12a-23 on the client side only (as was the case
for autofs, if I need to update anything on the NFS servers please let me know),
and I now typically get 205-260 mounts before the bindresvport error.
Note, in my case, each mount request is to a distince server.

Comment 3 Jeff Moyer 2005-04-20 17:36:21 UTC

And more cut-n-paste from 146629:


Comment #24 From Stuart Anderson (anderson.edu)  	 on 2005-04-20
12:23 EST  	[reply]  	 Private 	 

Many thanks for your help. I have also been tracking the corresponding static
mount problem, https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=141773

and I just reproduced it again today. In particular, after adding the
290 mount points to /etc/fstab and then running grep | xargs mount
I am able to get 237 successful mounts before finding,
nfs bindresvport: Address already in use
nfs bindresvport: Address already in use
nfs bindresvport: Address already in use
...
for the remainder of the 290 requests.

$ rpm -qf /bin/mount
util-linux-2.12a-23

Comment 4 Steve Dickson 2005-04-22 08:22:21 UTC

Could you please post the output of "netstat -a | grep ^tcp".

I think there is a reserver port leak in the pmap_getport() routine
which cause things like NIS to unnecessarily use reserver port
to talk to the portmapper.

Comment 5 Jeff Moyer 2005-04-22 12:01:04 UTC

Where exactly do you think there is a leak?  The pmap_getport code is pretty
straight-forward, and I can't find any leaks in it.  Do you have some empirical
evidence (i.e. a unit test) to prove this?

Now, pmap_getport definitely calls clnttcp_create with a socket of RPC_ANYSOCK,
which causes a new socket to be created and bound to a reserved port.  The
socket is closed before returning from pmap_getport, though, by way of calling
CLNT_DESTROY.

Comment 6 Stuart Anderson 2005-04-23 00:37:03 UTC

Created attachment 113579 [details]
output of netstat -a | grep ^tcp after failing to mount

I have removed the unimportant entries associated with ssh connections
and a few other non-NFS related services.

Comment 7 Steve Dickson 2005-04-26 13:02:30 UTC

Well as the netstat trace clearly shows (in Comment #6) about 40% of
the tcp connections that are in TIME_WAIT are from portmap requests.
Since a reserver port is *not* needed to make portmap requests, those
ports are definitely a waste with respect to reserver port space.

Comment 8 Stuart Anderson 2005-04-26 15:50:39 UTC

Any ideas on how to get rid of the reserved port TIME_WAIT portmap requests?
Or for that matter is there a reason any portmap connections should be left
in TIME_WAIT?

Comment 9 Steve Dickson 2005-05-18 18:19:07 UTC


*** This bug has been marked as a duplicate of 141773 ***

Note You need to log in before you can comment on or make changes to this bug.