Description of problem:
We have an NFS server with 1000Mbps links to our switch. We have several
client machines that have 100Mbps connections to the switch. When we try to
do NFS installs, they take 4-6 hours to complete.
Version-Release number of selected component (if applicable):
We have seen this problem with FC3 - FC5 (we have not tested for FC1 or FC2).
We have also seen this problem with RHEL3 & RHEL4.
When the client & server have different "speed" network connections to each
other, this problem always appears.
Steps to Reproduce:
1. Set up a network install server with a 1000Mbps connection
2. Launch an NFS install on a client with a 100Mbps connection (actually,
just make sure that the client & server have different speed connections.
The install takes several hours. One of my upgrades of a box with only ~500
packages installed, took 6 hours to complete. I redid that install using
another NFS server with a 100Mbps link and it took about 15 minutes.
The install should be just as fast as if both the client & server had links of
the same speed as the slower of the two.
We believe that the issue is being caused by anaconda's use of UDP based NFS
connections. It appears that when there is a difference in connection speed
into the switch between the client & server, these UDP segments are
significantly slowed down.
We believe that if anaconda were updated to use TCP NFS (v3?) by default, that
the installation time would be what is expected.
Please try with rawhide or a Fedora 7 test release. With kickstart installs,
you can pass mount options for NFS installs. We don't let you do this through
an interactive install.
Pass the --opts= parameter in your kickstart file with the mount(8) options you
want to use for NFS.
Let us know if that works or not.
We worked around the problem here by reconfiguring our managed switch to turn
on flow control. As cheaper, non-managed switches have this on by default, we
expect that this issue would not appear for admins with such switches.
However, this could still be an issue in environments with managed switches.
There are two possible fixes:
1. Users will have to enable flow control on their managed switches. This
could be an issue where the systems admins are not the ones in control of the
networking infrastructure, such as in most medium to large enterprises or if
they have other applications for which they need to leave flow control off.
2. Modify anaconda to use TCP for NFS. This solution fixes it for everyone,
everywhere regardless of switch configuration.
Personally, I feel that #2 is the right answer.
We could try turning flow control off in our switch and run an NFS install of
F7T[whatever]. However, if it's still using UDP for NFS, the results will be
the same as before.
UDP is still the default proto option for mount (see man 5 mount) and we're
hesitant to change to TCP in anaconda because of this. Basically, we'll create
a whole new set of problems for a different class of people by changing the
proto we're using. If you really do need to tweak the settings to this degree,
consider using a kickstart install with the --opts= parameter as I mentioned
earlier. If you'd like to see TCP become the default for NFS mounts (after
which we'd be much more likely to change how anaconda works) then please file a
bug against the nfs component asking for that change. Thanks for the report.
UDP is not the default protocol for NFS v3. Is anaconda using NFS v2?
I'm not seeing a mount(5) man page. There are man pages for mount in other
sections, but not in section 5 on FC6, RHEL5, several other distros I have
access to or in several minutes of Google searching. Looking at all the other
man pages for mount, I didn't find TCP or UDP mentioned anywhere. Where are
you getting this information? I'm not finding it.
What set of problems are you anticipating? TCP on a LAN isn't some kind of
exotic thing. Linux systems have supported NFS v3 for years.
TCP is the default for NFS mounts. Simply running tcpdump or wireshark while
mounting shares or accessing files shows that.
The problem here is that anaconda (not mount) is using UDP based NFS to access
the installation "media" from the network. With different speed links between
the NFS server and the machine being installed on, it takes several hours to
install. The reason for this is because many managed switches do not have
flow control turned on by default (non-managed switches do). Because of this,
the problem is only going to occur for people who have managed switches where
they haven't turned on the flow control options in their switch(es). In other
words, small networks won't experience the problem, but big ones (including
enterprise networks) will.
The way to fix this for everyone is to have anaconda use NFS over TCP. This
solution also does not require enterprise clients to fight the fight with
their networking people to get flow control on in the switches just to be able
to do network installs.
I encountered the same problem on FC7.
I thought NFS install was broken completely.
Tried ftp install from the same server ... worked fine.
Ran cable to plug new client system in to the same 1G unmanaged switch as NFS
server ... worked fine.
Client has Intel E1000 1Gb NIC, but was originally plugged in to 100Mb unmanaged
full-duplex switch on the same subnet as the server.
... small networks won't experience the problem, but big
ones (including enterprise networks) will.
I have a small network and I am experiencing the problem.
I agree that NFS should be using TCP. Adding options to a kickstart file is not
a practical solution for many/most people.
RHEL4 says that TCP is the default NFS protocol.
Let me know if you need to me to do any testing or data collection on this.
Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.
If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
Thanks for your help, and we apologize again that we haven't handled
these issues to this point.
The process we are following is outlined here:
We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.
And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers
This bug is open for a Fedora version that is no longer maintained and
will not be fixed by Fedora. Therefore we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen thus bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.