Bug 190042

Summary:	Installation via NFS where server & client have different speed connections is excrutiatingly slow
Product:	[Fedora] Fedora	Reporter:	Lamont Peterson <peregrine>
Component:	anaconda	Assignee:	Chris Lumens <clumens>
Status:	CLOSED WONTFIX	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	5	CC:	redhat2008, triage
Target Milestone:	---	Keywords:	Reopened
Target Release:	---
Hardware:	i386
OS:	Linux
Whiteboard:	bzcl34nup
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-05-06 15:50:54 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Lamont Peterson 2006-04-26 22:03:51 UTC

Description of problem:  
We have an NFS server with 1000Mbps links to our switch.  We have several 
client machines that have 100Mbps connections to the switch.  When we try to 
do NFS installs, they take 4-6 hours to complete.
  
Version-Release number of selected component (if applicable):  
We have seen this problem with FC3 - FC5 (we have not tested for FC1 or FC2). 
We have also seen this problem with RHEL3 & RHEL4. 
  
How reproducible:  
When the client & server have different "speed" network connections to each 
other, this problem always appears.  
  
Steps to Reproduce:  
1.  Set up a network install server with a 1000Mbps connection  
2.  Launch an NFS install on a client with a 100Mbps connection (actually, 
just make sure that the client & server have different speed connections.  
    
Actual results:  
The install takes several hours.  One of my upgrades of a box with only ~500 
packages installed, took 6 hours to complete.  I redid that install using 
another NFS server with a 100Mbps link and it took about 15 minutes.
  
Expected results:  
The install should be just as fast as if both the client & server had links of 
the same speed as the slower of the two.  
  
Additional info:  
We believe that the issue is being caused by anaconda's use of UDP based NFS 
connections.  It appears that when there is a difference in connection speed 
into the switch between the client & server, these UDP segments are 
significantly slowed down.  
  
We believe that if anaconda were updated to use TCP NFS (v3?) by default, that 
the installation time would be what is expected.

Comment 1 David Cantrell 2007-03-15 17:56:25 UTC

Please try with rawhide or a Fedora 7 test release.  With kickstart installs,
you can pass mount options for NFS installs.  We don't let you do this through
an interactive install.

Pass the --opts= parameter in your kickstart file with the mount(8) options you
want to use for NFS.

Let us know if that works or not.

Comment 2 Lamont Peterson 2007-03-15 22:56:06 UTC

We worked around the problem here by reconfiguring our managed switch to turn 
on flow control.  As cheaper, non-managed switches have this on by default, we 
expect that this issue would not appear for admins with such switches.  
However, this could still be an issue in environments with managed switches.

There are two possible fixes:

1.  Users will have to enable flow control on their managed switches.  This 
could be an issue where the systems admins are not the ones in control of the 
networking infrastructure, such as in most medium to large enterprises or if 
they have other applications for which they need to leave flow control off.

2.  Modify anaconda to use TCP for NFS.  This solution fixes it for everyone, 
everywhere regardless of switch configuration.

Personally, I feel that #2 is the right answer.

We could try turning flow control off in our switch and run an NFS install of 
F7T[whatever].  However, if it's still using UDP for NFS, the results will be 
the same as before.

Comment 3 Chris Lumens 2007-03-26 19:08:52 UTC

UDP is still the default proto option for mount (see man 5 mount) and we're
hesitant to change to TCP in anaconda because of this.  Basically, we'll create
a whole new set of problems for a different class of people by changing the
proto we're using.  If you really do need to tweak the settings to this degree,
consider using a kickstart install with the --opts= parameter as I mentioned
earlier.  If you'd like to see TCP become the default for NFS mounts (after
which we'd be much more likely to change how anaconda works) then please file a
bug against the nfs component asking for that change.  Thanks for the report.

Comment 4 Lamont Peterson 2007-03-30 23:03:28 UTC

UDP is not the default protocol for NFS v3.  Is anaconda using NFS v2?

I'm not seeing a mount(5) man page.  There are man pages for mount in other 
sections, but not in section 5 on FC6, RHEL5, several other distros I have 
access to or in several minutes of Google searching.  Looking at all the other 
man pages for mount, I didn't find TCP or UDP mentioned anywhere.  Where are 
you getting this information?  I'm not finding it.

What set of problems are you anticipating?  TCP on a LAN isn't some kind of 
exotic thing.  Linux systems have supported NFS v3 for years.

TCP is the default for NFS mounts.  Simply running tcpdump or wireshark while 
mounting shares or accessing files shows that.

The problem here is that anaconda (not mount) is using UDP based NFS to access 
the installation "media" from the network.  With different speed links between 
the NFS server and the machine being installed on, it takes several hours to 
install.  The reason for this is because many managed switches do not have 
flow control turned on by default (non-managed switches do).  Because of this, 
the problem is only going to occur for people who have managed switches where 
they haven't turned on the flow control options in their switch(es).  In other 
words, small networks won't experience the problem, but big ones (including 
enterprise networks) will.

The way to fix this for everyone is to have anaconda use NFS over TCP. This 
solution also does not require enterprise clients to fight the fight with 
their networking people to get flow control on in the switches just to be able 
to do network installs.

Comment 5 Michael 2007-06-02 02:49:47 UTC

I encountered the same problem on FC7. 
I thought NFS install was broken completely. 
Tried ftp install from the same server ... worked fine.
Ran cable to plug new client system in to the same 1G unmanaged switch as NFS
server ... worked fine. 
Client has Intel E1000 1Gb NIC, but was originally plugged in to 100Mb unmanaged
full-duplex switch on the same subnet as the server. 

Lamont said:
  ... small networks won't experience the problem, but big
   ones (including enterprise networks) will. 

I have a small network and I am experiencing the problem. 

I agree that NFS should be using TCP. Adding options to a kickstart file is not
a practical solution for many/most people. 

RHEL4 says that TCP is the default NFS protocol. 

http://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/ref-guide/ch-nfs.html

Let me know if you need to me to do any testing or data collection on this.

Comment 6 Bug Zapper 2008-04-04 02:45:39 UTC

Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 7 Bug Zapper 2008-05-06 15:50:52 UTC

This bug is open for a Fedora version that is no longer maintained and
will not be fixed by Fedora. Therefore we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen thus bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.