Bug 154678

Summary: [Texas Instruments] nfs bindresvport: Address already in use
Product: Red Hat Enterprise Linux 3 Reporter: Issue Tracker <tao>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED ERRATA QA Contact: Ben Levenson <benl>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: andrew_l_martin, george.liu, jakub, nhorman, petrides, rajeev, tao, wtkeeler
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2005-663 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-28 14:54:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Kernel Patch
none
glibc patch none

Description Issue Tracker 2005-04-13 14:18:20 UTC
Escalated to Bugzilla from IssueTracker

Comment 13 Steve Dickson 2005-04-22 08:43:32 UTC
Could you please post the output of "netstat -a | grep ^tcp".

I think there is a reserver port leak in the pmap_getport() routine
which cause things like NIS to unnecessarily use reserver port
to talk to the portmapper. 

Comment 16 Steve Dickson 2005-05-18 15:43:24 UTC
It appears both glibc and the kernel are misusing 
the reserver port space. The pmap_getport() and 
pmap_getmaps() glibc routines and the kernel use
reserver ports to communicate with the local or
remote portmapper. A reserver port is not needed
for these types of queries.

The result of this misuse causes the majority of
reserver port to be in TIME_WAIT during the mount
storm. 

Also the port ranges that both the glibc and kernel
try can be expended so the entire reserver port 
space can be tried. 

Finally I found that if the mount command retries 
every 5 seconds for 10 times, I was able to get 
an substantially more file system mounted.

Comment 17 Steve Dickson 2005-05-18 15:51:13 UTC
Created attachment 114516 [details]
Kernel Patch

This patch stop reserves ports from being used
on portmap quires and expands the reserver ports 
that will be tried.

Comment 18 Steve Dickson 2005-05-18 15:53:57 UTC
Created attachment 114519 [details]
glibc patch

This patch makes pmap_getport() and pmap_getmaps()
use non-reserver ports to do their queries.

This patch also increases the reserver ports that will
be tried as well as cause the entire pool of reserver 
ports will be tried on every call.

Comment 20 Ernie Petrides 2005-06-09 00:05:08 UTC
Changing to kernel component.

Comment 23 Ernie Petrides 2005-06-15 01:03:48 UTC
A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.8.EL).


Comment 28 Ernie Petrides 2005-07-20 07:37:32 UTC
A revision to the fix for this problem has just been committed to the
RHEL3 U6 patch pool this evening (in kernel version 2.4.21-33.EL).


Comment 36 Red Hat Bugzilla 2005-09-28 14:54:28 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html


Comment 37 Ernie Petrides 2006-01-26 21:23:03 UTC
*** Bug 173495 has been marked as a duplicate of this bug. ***

Comment 38 Steve Dickson 2006-03-28 15:29:04 UTC
*** Bug 186310 has been marked as a duplicate of this bug. ***

Comment 39 Andrew Martin 2006-06-28 14:36:52 UTC
It appears this problem is back in kernel 2.4.21-40.EL.  When I updated the 
kernel I started having a major slow down when mounting over 800 mounts.  
The /var/log/messages is showing the error "nfs bindresvport: Address already 
in use" over and over.  I opened a ticket with red hat support, and after they 
looked into it, I was told to update this ticket with the information.  Thanks.

Comment 40 Steve Dickson 2006-07-01 14:34:07 UTC
did you update glibc and utils-linux as well?

Comment 41 Andrew Martin 2006-07-05 14:47:38 UTC
I ran up2date and updated everything.  My current version of glibc is glibc-
2.3.2-95.39.  How do I find what version of utils-linux I'm running?

Comment 42 Steve Dickson 2006-07-18 18:43:22 UTC
rpm -q utils-linux

Comment 43 Andrew Martin 2006-07-18 20:27:02 UTC
I tried that and it said it wasn't installed so I figured I was doing 
something wrong.  I figured I had to have it so I started looking around and 
found util-linux without the "s" and found I have util-linux-2.11y-31.11.  
Thanks.

Comment 44 Steve Dickson 2006-07-18 21:08:36 UTC
Sorry about that... so util-linux-2.11y-31.11 does indeed fix this problem?

Comment 45 Andrew Martin 2006-07-18 22:15:39 UTC
No, it isn't fixed.  I was just answering your question about what version I 
was running.  Sorry for the confusion.

Comment 46 Steve Dickson 2006-07-20 12:59:51 UTC
Looking back at the RPM changelog of util-linux, it appears the 
fix for this bug when into version 2.11y-31.13. So either upgrade
to 2.11y-31.13 or the latest version util-linux-2.11y-31.18

Comment 47 Andrew Martin 2006-07-20 18:57:25 UTC
I just updated to the latest util-linux.  I'm at util-linux-2.11y-31.18 and 
glibc-2.3.2-95.44.  The problem still exists.  I'm wondering if my problem 
isn't the same as the one in this bug.

I was able to replicate it.  The following assumes /scratch is an exported 
drive on machine ec2090 and it's run on ec2090.

mkdir /scratch/testdir
mkdir /scratch/testdir/mountdir
cd /scratch/testdir
I=1
J=1
while [ $I -lt 1000 ]; do
mkdir dir$I
mkdir mountdir/mount$I
mount ec2090:/scratch/testdir/dir$I /scratch/testdir/mountdir/mount$I
let I=I+1
done
while [ $J -lt 1000 ]; do 
umount /scratch/testdir/mountdir/mount$J
rm -rf /scratch/testdir/mountdir/mount$J
rm -rf dir$J
let J=J+1
done


When I run this, I'm able to mount almost 500 directories before it starts 
giving me the error "nfs bindresvport: Address already in use" on the prompt.  
The exact number it errors on changes each time I run it.


Comment 48 Steve Dickson 2006-07-21 10:55:51 UTC
Question: After the script dies, does 'netstat -an | grep 111' show those 
connections being made ports > 1024? There was also a fix to the portmap routines 
in glibc that stop them from using reserver ports (i.e. ports < 1024).  I just 
want to make sure you have that fix as well... 

Comment 49 Andrew Martin 2006-07-24 15:57:42 UTC
When I do 'netstat -an |grep 111' I get 999 connections with ports from 54434 
to 55476.  Thanks.

Comment 50 Steve Dickson 2006-07-24 16:17:06 UTC
Ok... it appears you have the correct glibc since all those
connections are not on ports < 1024... 

Here is the test scrip I used to get over 100 mounts
Note: this scrips assume there is directory tree that
already exists on the server.

#!/bin/bash
MOUNT=mount
HOST=ppro5
for i in `seq 1 1020`
do
    [ ! -d /mnt/$i ] && mkdir /mnt/$i
    $MOUNT -v -t nfs -o tcp $HOST:/home/tree/$i /mnt/$i || exit 1
    ls /mnt/$i;
done

Please run this scrip to see how many mounts you get...

Here is the umount scrip that can be run to clean up the mounts

#!/bin/bash
for i in `seq 1 1020`
do
    umount /mnt/$i || exit 1;
done



Comment 51 Andrew Martin 2006-07-24 18:00:03 UTC
It gave the error after mount 503.

Comment 52 Andrew Martin 2006-09-12 17:17:09 UTC
Any news on this?  It's killing us.  We did discover that a similar problem 
exists in RHEL 4.  Thanks.

Comment 53 Ernie Petrides 2006-09-12 19:38:46 UTC
Andrew, please open a new bug report so that the problem will receive
appropriate attention.  This bug report is marked CLOSED/ERRATA, since
the problem as originally reported here was resolved in U6.

If one of the linked Issue Tracker IDs linked to this BZ is yours, you
should also relink it to your new bug report.

Thanks in advance.  -ernie