Bug 154678

Summary:

[Texas Instruments] nfs bindresvport: Address already in use

Product:

Red Hat Enterprise Linux 3

Reporter:

Issue Tracker <tao>

Component:

kernel

Assignee:

Steve Dickson <steved>

Status:

CLOSED ERRATA

QA Contact:

Ben Levenson <benl>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

3.0

CC:

andrew_l_martin, george.liu, jakub, nhorman, petrides, rajeev, tao, wtkeeler

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

RHSA-2005-663

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2005-09-28 14:54:27 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Kernel Patch	none
glibc patch	none

Description Issue Tracker 2005-04-13 14:18:20 UTC

Escalated to Bugzilla from IssueTracker

Comment 13 Steve Dickson 2005-04-22 08:43:32 UTC

Could you please post the output of "netstat -a | grep ^tcp".

I think there is a reserver port leak in the pmap_getport() routine
which cause things like NIS to unnecessarily use reserver port
to talk to the portmapper.

Comment 16 Steve Dickson 2005-05-18 15:43:24 UTC

It appears both glibc and the kernel are misusing 
the reserver port space. The pmap_getport() and 
pmap_getmaps() glibc routines and the kernel use
reserver ports to communicate with the local or
remote portmapper. A reserver port is not needed
for these types of queries.

The result of this misuse causes the majority of
reserver port to be in TIME_WAIT during the mount
storm. 

Also the port ranges that both the glibc and kernel
try can be expended so the entire reserver port 
space can be tried. 

Finally I found that if the mount command retries 
every 5 seconds for 10 times, I was able to get 
an substantially more file system mounted.

Comment 17 Steve Dickson 2005-05-18 15:51:13 UTC

Created attachment 114516 [details]
Kernel Patch

This patch stop reserves ports from being used
on portmap quires and expands the reserver ports 
that will be tried.

Comment 18 Steve Dickson 2005-05-18 15:53:57 UTC

Created attachment 114519 [details]
glibc patch

This patch makes pmap_getport() and pmap_getmaps()
use non-reserver ports to do their queries.

This patch also increases the reserver ports that will
be tried as well as cause the entire pool of reserver 
ports will be tried on every call.

Comment 20 Ernie Petrides 2005-06-09 00:05:08 UTC

Changing to kernel component.

Comment 23 Ernie Petrides 2005-06-15 01:03:48 UTC

A fix for this problem has just been committed to the RHEL3 U6
patch pool this evening (in kernel version 2.4.21-32.8.EL).

Comment 28 Ernie Petrides 2005-07-20 07:37:32 UTC

A revision to the fix for this problem has just been committed to the
RHEL3 U6 patch pool this evening (in kernel version 2.4.21-33.EL).

Comment 36 Red Hat Bugzilla 2005-09-28 14:54:28 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-663.html

Comment 37 Ernie Petrides 2006-01-26 21:23:03 UTC

*** Bug 173495 has been marked as a duplicate of this bug. ***

Comment 38 Steve Dickson 2006-03-28 15:29:04 UTC

*** Bug 186310 has been marked as a duplicate of this bug. ***

Comment 39 Andrew Martin 2006-06-28 14:36:52 UTC

It appears this problem is back in kernel 2.4.21-40.EL.  When I updated the 
kernel I started having a major slow down when mounting over 800 mounts.  
The /var/log/messages is showing the error "nfs bindresvport: Address already 
in use" over and over.  I opened a ticket with red hat support, and after they 
looked into it, I was told to update this ticket with the information.  Thanks.

Comment 40 Steve Dickson 2006-07-01 14:34:07 UTC

did you update glibc and utils-linux as well?

Comment 41 Andrew Martin 2006-07-05 14:47:38 UTC

I ran up2date and updated everything.  My current version of glibc is glibc-
2.3.2-95.39.  How do I find what version of utils-linux I'm running?

Comment 42 Steve Dickson 2006-07-18 18:43:22 UTC

rpm -q utils-linux

Comment 43 Andrew Martin 2006-07-18 20:27:02 UTC

I tried that and it said it wasn't installed so I figured I was doing 
something wrong.  I figured I had to have it so I started looking around and 
found util-linux without the "s" and found I have util-linux-2.11y-31.11.  
Thanks.

Comment 44 Steve Dickson 2006-07-18 21:08:36 UTC

Sorry about that... so util-linux-2.11y-31.11 does indeed fix this problem?

Comment 45 Andrew Martin 2006-07-18 22:15:39 UTC

No, it isn't fixed.  I was just answering your question about what version I 
was running.  Sorry for the confusion.

Comment 46 Steve Dickson 2006-07-20 12:59:51 UTC

Looking back at the RPM changelog of util-linux, it appears the 
fix for this bug when into version 2.11y-31.13. So either upgrade
to 2.11y-31.13 or the latest version util-linux-2.11y-31.18

Comment 47 Andrew Martin 2006-07-20 18:57:25 UTC

I just updated to the latest util-linux.  I'm at util-linux-2.11y-31.18 and 
glibc-2.3.2-95.44.  The problem still exists.  I'm wondering if my problem 
isn't the same as the one in this bug.

I was able to replicate it.  The following assumes /scratch is an exported 
drive on machine ec2090 and it's run on ec2090.

mkdir /scratch/testdir
mkdir /scratch/testdir/mountdir
cd /scratch/testdir
I=1
J=1
while [ $I -lt 1000 ]; do
mkdir dir$I
mkdir mountdir/mount$I
mount ec2090:/scratch/testdir/dir$I /scratch/testdir/mountdir/mount$I
let I=I+1
done
while [ $J -lt 1000 ]; do 
umount /scratch/testdir/mountdir/mount$J
rm -rf /scratch/testdir/mountdir/mount$J
rm -rf dir$J
let J=J+1
done


When I run this, I'm able to mount almost 500 directories before it starts 
giving me the error "nfs bindresvport: Address already in use" on the prompt.  
The exact number it errors on changes each time I run it.

Comment 48 Steve Dickson 2006-07-21 10:55:51 UTC

Question: After the script dies, does 'netstat -an | grep 111' show those 
connections being made ports > 1024? There was also a fix to the portmap routines 
in glibc that stop them from using reserver ports (i.e. ports < 1024).  I just 
want to make sure you have that fix as well...

Comment 49 Andrew Martin 2006-07-24 15:57:42 UTC

When I do 'netstat -an |grep 111' I get 999 connections with ports from 54434 
to 55476.  Thanks.

Comment 50 Steve Dickson 2006-07-24 16:17:06 UTC

Ok... it appears you have the correct glibc since all those
connections are not on ports < 1024... 

Here is the test scrip I used to get over 100 mounts
Note: this scrips assume there is directory tree that
already exists on the server.

#!/bin/bash
MOUNT=mount
HOST=ppro5
for i in `seq 1 1020`
do
    [ ! -d /mnt/$i ] && mkdir /mnt/$i
    $MOUNT -v -t nfs -o tcp $HOST:/home/tree/$i /mnt/$i || exit 1
    ls /mnt/$i;
done

Please run this scrip to see how many mounts you get...

Here is the umount scrip that can be run to clean up the mounts

#!/bin/bash
for i in `seq 1 1020`
do
    umount /mnt/$i || exit 1;
done

Comment 51 Andrew Martin 2006-07-24 18:00:03 UTC

It gave the error after mount 503.

Comment 52 Andrew Martin 2006-09-12 17:17:09 UTC

Any news on this?  It's killing us.  We did discover that a similar problem 
exists in RHEL 4.  Thanks.

Comment 53 Ernie Petrides 2006-09-12 19:38:46 UTC

Andrew, please open a new bug report so that the problem will receive
appropriate attention.  This bug report is marked CLOSED/ERRATA, since
the problem as originally reported here was resolved in U6.

If one of the linked Issue Tracker IDs linked to this BZ is yours, you
should also relink it to your new bug report.

Thanks in advance.  -ernie