Bug 151097

Summary: Default TCP retransmit timeout too fast on NFS mounts
Product: Red Hat Enterprise Linux 3 Reporter: Chuck Lever <cel>
Component: util-linuxAssignee: Steve Dickson <steved>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: buckh, kzak, menscher, mitch48
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2005-626 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-28 15:53:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 156320    

Description Chuck Lever 2005-03-14 21:26:27 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6)
Gecko/20050302 Firefox/1.0.1 Fedora/1.0.1-1.3.2

Description of problem:
If no timeo= option is specified when mounting an NFS file system with
TCP, the mount command provides a default value of 0.7 seconds.  This
value may be appropriate for NFS over UDP, but is way too aggressive
for TCP, and can result in performance loss or data corruption.  The
correct
default settings for NFS over TCP on 2.4 kernels should be
timeo=600,retrans=2.

Note that RHEL AS 2.1 also has this problem, but the RHEL 4 mount
command should have patches that were included to support NFSv4, which
have a fix for this issue.

Version-Release number of selected component (if applicable):
util-linux-2.11y-31.2

How reproducible:
Always

Steps to Reproduce:
1.  Add a printk in the NFS client's mount logic to show the timeo
2.  mount -o tcp
3.  look at the output of the printk
    

Actual Results:  The printk will show that the mount command passes in
a default timeo
and retrans value, and that value is too small on NFS over TCP mounts

Expected Results:  The mount command should pass in no timeo value (in
which case the
NFS client will pick an appropriate default, or should pass in a
reasonable timeo value such as described above.

Additional info:

This is a critical problem for customers who use NFS over TCP.

Comment 1 Chuck Lever 2005-03-16 16:00:33 UTC
One reason this problem has gone on for so long is that /proc/mounts does not
display the actual timeo and retrans values in effect for an NFS mount point. 
As part of the fix for this bug, can we get support for displaying those mount
options added to the NFS client's show_options method?  I'm working on adding
similar support in 2.6 mainline.  Thanks!

Comment 2 Chuck Lever 2005-03-31 22:32:05 UTC
What's the status of this issue?  The problem can potentially result in data
corruption, so we'd like a fix for this in the next update, if possible.

Comment 3 Tom Mitchell 2005-04-01 01:17:24 UTC
This impacts those with 'older' NFS appliances and fileservers.
Those with nis maps for automount etc will do well to set timeouts....

Comment 4 Damian Menscher 2005-04-13 01:42:22 UTC
I'm having possibly-related problems with this issue under RHEL3, but using 
UDP.  As previously mentioned, the UDP timeout is supposed to be 0.7 seconds, 
then double repeatedly after each timeout up to a max of 60 seconds.  Looking 
at the source shows the line "data.timeo = tcp ? 70 : 7;", which I take to mean 
UDP has a 0.7 second timeout, and TCP has a 7 second timeout, by default.

Problem is, that doesn't seem to be the case at all.  I used tcpdump to get a 
packet capture that included some timeouts.  The shocking thing is that it's 
not waiting anywhere near 0.7 seconds for the RPC response.  It's actually much 
shorter.  The first timeout seems to fluctuate a bit (latency in packet capture 
makes it hard to be precise), but it's on the order of 0.07 seconds.  I'm not 
sure, but maybe the order-of-magnitude shift is because we're using gigabit?  
Another possible issue is we're using the SMP kernel.

Anyway, this is a serious issue, since a moderately loaded fileserver will 
frequently take more than 0.07 seconds to respond.  I have not yet tested 
whether setting the timeo= option will be respected, but I don't have high 
hopes given how quickly it's timing out right now.

Should I submit this as a separate bug?  It's not clear to me whether it's the 
same bug or a different one.

Comment 5 Chuck Lever 2005-04-18 14:16:57 UTC
damian-

short UDP timeouts are normal.  RHEL 3 uses a request round-trip time estimator
which can trim the timeouts pretty short.  it will ignore the mount command line
setting.  i believe the lower bound was raised in later updates of RHEL 3 to
address the same issue you are reporting, but i can't find the bugzilla report
where this is addressed.  if you report this problem again, be sure to mention
which update of RHEL 3 you are using.

Comment 6 Chuck Lever 2005-04-18 14:19:02 UTC
My management is pressing me pretty hard on this issue, as it increases the
potential for data corruption on NFS/TCP mounts that use the default timeout
setting.  When will we get a fix for this problem?

Comment 7 Damian Menscher 2005-04-19 03:13:46 UTC
Yes, the short UDP timeouts are a result of the RTT estimator trimming it to 
HZ/30.  Recent kernels use HZ/10.  I've submitted <A 
HREF="https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=155313">Bug 
155313</A> on this issue, since it appears to be separate from your bug.  We 
increased to retrans=10 in the meantime.

Comment 10 Steve Dickson 2005-06-08 20:57:27 UTC
Should be fixed in util-linux-2.11y-31.8

Comment 14 Red Hat Bugzilla 2005-09-28 15:53:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-626.html