Red Hat Bugzilla – Bug 69579
[acenic/e1000] NFS Mount Problem With Fiber Network Cards
Last modified: 2008-08-01 12:22:52 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1a+) Gecko/20020715
Description of problem:
On very fast systems that are using fiber network cards what is happening is
that the startup scripts are running so fast that the fiber optic card has not
had time to connect to the network yet. Fiber cards seem to take another second
or two to negotiate. What happens is that the systems are dropping through the
point where NFS mounts are performed before the card is ready and the mounts
aren't performed. It's almost like the system perceives a network doesn't exist
because the card isn't quite ready.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Install fiber optic NIC card in fast system (Dual 1.0 in this case)
2. Put in NFS mounts into /etc/fstab
3. Reboot system, wait for mounts to come up - They Don't ever.
Actual Results: The system comes up, the network finally starts, but the NFS
mounts have not bee performed.
Expected Results: The NFS startup scripts should understand that eth0 exists,
and if the network is not up yet, wait a few seconds.
This is not really an initscripts issue; the initscripts check for a presence of
an address before they continue. What driver are you using?
This same issue has come up on all of our servers that we have migrated to fiber
optic network cards. Systems that run on 100Mb copper cards are working fine.
We have 2 drivers in use and they are both doing the same thing.
One is the 'acenic' driver.
The other is the 'e1000' driver.
Every other aspect of the network works just fine, and you can mount the NFS
mounts by hand once the system is up with no problems.
Here is an example of a mount that is attempted, via fstab that fails:
oa1:/home /oa1_home nfs exec,dev,suid,rw 1 1
oa1:/a/largo /home/largo nfs exec,dev,suid,rw 1 1
I'm happy to see some movement on this bug, as it still remains with
us even in RH Ent 3. No fiber optic cards work correctly with NFS
mounts to remote servers.
When NFS starts the first time, it needs to sit for another few
seconds while the backbone finds the NIC and the network comes up. In
our case, the 3com Corebuilder takes a few seconds longer to come up
if you have moved it to another physical port, because it's doing
switching and "remembers" where things are and updates tables. This
really is a matter of just starting the network, and waiting a few
more seconds. We ended up adding a few lines in S99local to sleep for
a bit and then do a nfs restart.
I'm not inclined to just put-in a sleep that most people won't need --
why slow-down the boot any more?
David, does ethtool show the NIC's link as down (i.e. "Link detected:
no") while the network is unavailable? If not, is there anything else
one might key off to know when it is safe to continue?
Bill, could we have a variable in ifcfg-* that ifup-post might key off
to put in a user configurable delay? Just an idea...
Hm, well, it would be a magic parameter, which is somewhat icky.
Are you using static IPs?
To answer your questions, IPs are all static. When NFS hits in the
first pass it says "No Route To Host" on the console. I added the
following piece in the front of K20nfs in /etc/rc5.d That should
show the status at boot time. Reboot for that server is scheduled for
mid of next week and I'll report it then.
echo "----------------------" >> /tmp/redhat.txt
date >> /tmp/redhat.txt
ethtool eth0 >> /tmp/redhat.txt
echo "----------------------" >> /tmp/redhat.txt
Yeah, static interfaces are more-or-less defined to have a behavior of
not waiting for a link.
Bad news, RedHat thinks the card is up and running @ boot time. The
media seems to be up to the OS, but it's not yet live. Here is what
we got right before NFS tried to start. :(
Fri Sep 10 09:13:10 EDT 2004
Settings for eth0:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 1000baseT/Full
Advertised auto-negotiation: Yes
Supports Wake-on: d
Link detected: yes
In that case, I'm not sure how to handle it - there's not a good way
to code 'assume this particular card isn't up when it says it is.'
Any chance this card is plugged-in to a switch running spanning-tree
Bug 110036 and bug 131475 show the need for a generic post link-up
delay in the ifup scripts...
The ifup scripts are completely unrelated to kickstart.
I think that misses the point. Besides, bug 110036 specifically
mentions NFS mounts failing because the ports are not yet usable due
The point is that there are common (even legitimate) networking
configurations that can cause a mere check for link-up to be
insufficient to warrant moving forward with the boot process. Being
connected directly to a port running STP is one of them.
I think it behooves RH to provide a generic, documented, configurable
means of delaying the boot process on a port-by-port basis.
I am in favor of the discussed idea of adding the number of seconds to
wait into a file. Configure it to zero by default, and 99% of people
won't know about it, or use it. But those of us that need to wait a
bit for those cards to light, can make use of it. I mentioned
previously, that it's not just this one fiber card, it's all gigabit
fiber cards. All of them do the same thing with a 3Com Core Builder.
And I also mentioned, that if you move a server to another physical
port, that it takes even longer for the network to come up. The 3Com
seems to add it to some internal routing tables when you move it, and
this takes a few seconds to do.
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases,
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/