Bug 155313

Summary: NFS over UDP timeouts due to oversmall RPC_RTO_MIN
Product: Red Hat Enterprise Linux 3 Reporter: Damian Menscher <menscher>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED WONTFIX QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: buckh, petrides, shillman
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-19 19:04:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Damian Menscher 2005-04-19 03:06:40 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)

Description of problem:
When using NFS over UDP, the timeo mount option is ignored in favor of an adaptive Round Trip Time (RTT) estimator (that this is contrary to all known documentation is a bug in itself).  The minimum allowable time for the adaptive estimator is set in .../net/sunrpc/timer.c as
#define RPC_RTO_MIN (HZ/30)
Note that on fast hardware, the timeo will tune itself down to about 0.04s.  With the default retransmit value (retrans=3) this gives a server just 0.6s to respond.  This will lead to frequent timeouts, which can cause data corruption in the case of a soft mount.

I recommend using the value from recent kernels: HZ/10, which will give the server a minimum of 1.5s to respond.

Version-Release number of selected component (if applicable):
kernel-2.4.21-27.0.2.EL

How reproducible:
Always

Steps to Reproduce:
1. running "du" on a soft-mounted partition (UDP as the transport) will often show the problem.  But it's easier to just read the kernel source.


Actual Results:  The RPC call will return ETIMEDOUT, which returns EIO to the calling program, and logs an error to the syslog.

Additional info:

This one should be a no-brainer... it's been fixed in the mainstream kernel for a fairly long time.  Marking as "high" severity since data corruption could result from an unwarranted NFS timeout.

Comment 3 Damian Menscher 2005-04-19 21:32:03 UTC
It's probably worth mentioning (for the benefit of others with this problem) 
that setting retrans=5 (or larger) as a mount option will approximate 
the "correct" behavior.  You need to umount/mount, as the remount option 
doesn't appear to let you change NFS mount options (see Bug 155392).

Comment 4 RHEL Program Management 2007-10-19 19:04:25 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.