From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1a+) Gecko/20020715 Description of problem: On very fast systems that are using fiber network cards what is happening is that the startup scripts are running so fast that the fiber optic card has not had time to connect to the network yet. Fiber cards seem to take another second or two to negotiate. What happens is that the systems are dropping through the point where NFS mounts are performed before the card is ready and the mounts aren't performed. It's almost like the system perceives a network doesn't exist because the card isn't quite ready. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. Install fiber optic NIC card in fast system (Dual 1.0 in this case) 2. Put in NFS mounts into /etc/fstab 3. Reboot system, wait for mounts to come up - They Don't ever. Actual Results: The system comes up, the network finally starts, but the NFS mounts have not bee performed. Expected Results: The NFS startup scripts should understand that eth0 exists, and if the network is not up yet, wait a few seconds. Additional info:
This is not really an initscripts issue; the initscripts check for a presence of an address before they continue. What driver are you using?
This same issue has come up on all of our servers that we have migrated to fiber optic network cards. Systems that run on 100Mb copper cards are working fine. We have 2 drivers in use and they are both doing the same thing. One is the 'acenic' driver. The other is the 'e1000' driver. Every other aspect of the network works just fine, and you can mount the NFS mounts by hand once the system is up with no problems. Here is an example of a mount that is attempted, via fstab that fails: oa1:/home /oa1_home nfs exec,dev,suid,rw 1 1 oa1:/a/largo /home/largo nfs exec,dev,suid,rw 1 1
I'm happy to see some movement on this bug, as it still remains with us even in RH Ent 3. No fiber optic cards work correctly with NFS mounts to remote servers. When NFS starts the first time, it needs to sit for another few seconds while the backbone finds the NIC and the network comes up. In our case, the 3com Corebuilder takes a few seconds longer to come up if you have moved it to another physical port, because it's doing switching and "remembers" where things are and updates tables. This really is a matter of just starting the network, and waiting a few more seconds. We ended up adding a few lines in S99local to sleep for a bit and then do a nfs restart.
I'm not inclined to just put-in a sleep that most people won't need -- why slow-down the boot any more? David, does ethtool show the NIC's link as down (i.e. "Link detected: no") while the network is unavailable? If not, is there anything else one might key off to know when it is safe to continue? Bill, could we have a variable in ifcfg-* that ifup-post might key off to put in a user configurable delay? Just an idea...
Hm, well, it would be a magic parameter, which is somewhat icky.
Are you using static IPs?
To answer your questions, IPs are all static. When NFS hits in the first pass it says "No Route To Host" on the console. I added the following piece in the front of K20nfs in /etc/rc5.d That should show the status at boot time. Reboot for that server is scheduled for mid of next week and I'll report it then. echo "----------------------" >> /tmp/redhat.txt date >> /tmp/redhat.txt ethtool eth0 >> /tmp/redhat.txt echo "----------------------" >> /tmp/redhat.txt
Yeah, static interfaces are more-or-less defined to have a behavior of not waiting for a link.
Bad news, RedHat thinks the card is up and running @ boot time. The media seems to be up to the OS, but it's not yet live. Here is what we got right before NFS tried to start. :( Fri Sep 10 09:13:10 EDT 2004 Settings for eth0: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: FIBRE PHYAD: 0 Transceiver: externel Auto-negotiation: off Supports Wake-on: d Wake-on: d Link detected: yes
In that case, I'm not sure how to handle it - there's not a good way to code 'assume this particular card isn't up when it says it is.'
Any chance this card is plugged-in to a switch running spanning-tree (i.e. STP)? Bug 110036 and bug 131475 show the need for a generic post link-up delay in the ifup scripts...
The ifup scripts are completely unrelated to kickstart.
I think that misses the point. Besides, bug 110036 specifically mentions NFS mounts failing because the ports are not yet usable due to STP. The point is that there are common (even legitimate) networking configurations that can cause a mere check for link-up to be insufficient to warrant moving forward with the boot process. Being connected directly to a port running STP is one of them. I think it behooves RH to provide a generic, documented, configurable means of delaying the boot process on a port-by-port basis.
I am in favor of the discussed idea of adding the number of seconds to wait into a file. Configure it to zero by default, and 99% of people won't know about it, or use it. But those of us that need to wait a bit for those cards to light, can make use of it. I mentioned previously, that it's not just this one fiber card, it's all gigabit fiber cards. All of them do the same thing with a 3Com Core Builder. And I also mentioned, that if you move a server to another physical port, that it takes even longer for the network to come up. The 3Com seems to add it to some internal routing tables when you move it, and this takes a few seconds to do.
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/