Bug 155313 - NFS over UDP timeouts due to oversmall RPC_RTO_MIN
NFS over UDP timeouts due to oversmall RPC_RTO_MIN
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Steve Dickson
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-04-18 23:06 EDT by Damian Menscher
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 15:04:25 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Damian Menscher 2005-04-18 23:06:40 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)

Description of problem:
When using NFS over UDP, the timeo mount option is ignored in favor of an adaptive Round Trip Time (RTT) estimator (that this is contrary to all known documentation is a bug in itself).  The minimum allowable time for the adaptive estimator is set in .../net/sunrpc/timer.c as
#define RPC_RTO_MIN (HZ/30)
Note that on fast hardware, the timeo will tune itself down to about 0.04s.  With the default retransmit value (retrans=3) this gives a server just 0.6s to respond.  This will lead to frequent timeouts, which can cause data corruption in the case of a soft mount.

I recommend using the value from recent kernels: HZ/10, which will give the server a minimum of 1.5s to respond.

Version-Release number of selected component (if applicable):
kernel-2.4.21-27.0.2.EL

How reproducible:
Always

Steps to Reproduce:
1. running "du" on a soft-mounted partition (UDP as the transport) will often show the problem.  But it's easier to just read the kernel source.


Actual Results:  The RPC call will return ETIMEDOUT, which returns EIO to the calling program, and logs an error to the syslog.

Additional info:

This one should be a no-brainer... it's been fixed in the mainstream kernel for a fairly long time.  Marking as "high" severity since data corruption could result from an unwarranted NFS timeout.
Comment 3 Damian Menscher 2005-04-19 17:32:03 EDT
It's probably worth mentioning (for the benefit of others with this problem) 
that setting retrans=5 (or larger) as a mount option will approximate 
the "correct" behavior.  You need to umount/mount, as the remount option 
doesn't appear to let you change NFS mount options (see Bug 155392).
Comment 4 RHEL Product and Program Management 2007-10-19 15:04:25 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.