Bug 212398

Summary: NFS mounts freeze
Product: Red Hat Enterprise Linux 4 Reporter: Mike Mosley <jmmosley>
Component: kernelAssignee: Peter Staubach <staubach>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.4CC: christoph.handel, jbaron, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0304 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-08 03:57:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Proposed patch none

Description Mike Mosley 2006-10-26 17:36:31 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1)

Description of problem:
We have a RHEL client that is mounting some shares from a NAS device via NFS (version 3). The client has been running various flavors of RHEL 3 for a couple of years now without issue.  We recently upgraded to RHEL4 (u4) with all applicable patches and now we see the NFS mount freeze up after a period of time (usually overnight).  If we kill the processes trying to access the mounts, unmount, and remount the shares, everything is fine until the next occurence.  The only error message we see in /var/log/messages is:

kernel: RPC: error 512 connecting to server xxx.xxx.xxx.xxx

Version-Release number of selected component (if applicable):
kernel-2.6.9-42.0.3

How reproducible:
Always


Steps to Reproduce:
1.Mount an NFS share using the default options plus intr.
2.Have some process access the share over a period of time.
3.

Actual Results:
After a period of time, data on the mounts cannot be accessed.

Expected Results:


Additional info:
This seems similar to a problem that was discussed in the NFSv4 mailing list even though we are not trying to use NFSv4.  The discussion referenced bugzilla bug 194793 but I was unable to get access to that particular bug id when I logged in to bugzilla.

Also, I am running the EL.SMP version of the kernel (2 CPUS).

Our remaining clients which are still running RHEL3 (update 6) are not experiencing this issue even though they are mounting the same shares and running the same processes.

Comment 1 Peter Staubach 2006-12-08 14:11:59 UTC
This sort of thing works for me.  I do it all of the time.

Perhaps a stacktrace of all of the processes in the system, when
the hangs occurs could be attached, please?

Comment 2 Christoph 2006-12-18 10:35:52 UTC
Same Problem here. A long tcpdump reveiled outgoing TCP Connects from NFS Client
Port 664, but no answers from the server. 
Research pointed to an IPMI Card see http://lkml.org/lkml/2006/10/10/239

Comment 3 Christoph 2006-12-18 13:34:39 UTC
You can fix this using sysctl

root@host:~> sysctl -w sunrpc.min_resvport=665

to make it permanent add the following line to /etc/sysctl.conf

sunrpc.min_resvport = 665


Comment 4 Peter Staubach 2006-12-18 18:45:49 UTC
Changing the default setting for xprt_min_resvport does seem like a good
thing to do.  Changing the default would also match the RHEL-5 kernel as
well as upstream.

Comment 5 Peter Staubach 2006-12-18 18:46:42 UTC
Created attachment 143922 [details]
Proposed patch

Comment 6 Peter Staubach 2006-12-18 18:48:05 UTC
Mike, can you tell if this is the problem that you are seeing?

Comment 7 RHEL Program Management 2006-12-19 17:07:47 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 Christoph 2006-12-25 21:39:37 UTC
(In reply to comment #2)
> Same Problem here. A long tcpdump reveiled outgoing TCP Connects from NFS Client
> Port 664, but no answers from the server. 
> Research pointed to an IPMI Card see http://lkml.org/lkml/2006/10/10/239

Clarification:

The IPMI Board swallows all packages directed to Port 664 (used by ipmi for RPC
calls). All Server answers will be hidden from the Client OS. Problem hast been
fixed in linux Kernels starting with 2.6.19 by raising the minimum RPC port used
by NFS to Port 665.

Comment 9 Jay Turner 2007-01-02 13:47:11 UTC
QE ack for RHEL4.5.

Comment 10 Peter Staubach 2007-01-29 20:17:27 UTC
Posted to rhkernel-list.

Comment 12 Jay Turner 2007-01-30 19:31:02 UTC
QE ack for RHEL4.5.

Comment 13 Jason Baron 2007-02-01 19:31:57 UTC
committed in stream U5 build 45. A test kernel with this patch is available from
http://people.redhat.com/~jbaron/rhel4/

Comment 15 Mike Gahagan 2007-02-23 19:12:55 UTC
looks like the fix for this one is in -48, there is nothing in sysctl.conf on
this system that would be setting  min_resvport.

[root@dhcp58-247 sunrpc]# cat min_resvport 
665
[root@dhcp58-247 sunrpc]# cat max_resvport 
1023


Comment 17 Red Hat Bugzilla 2007-05-08 03:57:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html