69579 – [acenic/e1000] NFS Mount Problem With Fiber Network Cards

Bug 69579 - [acenic/e1000] NFS Mount Problem With Fiber Network Cards

Summary: [acenic/e1000] NFS Mount Problem With Fiber Network Cards

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	kernel
Sub Component:
Version:	7.3
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jeremy Katz
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-07-23 13:17 UTC by David Richards
Modified:	2008-08-01 16:22 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-09-30 15:39:46 UTC
Embargoed:

Attachments	(Terms of Use)

Description David Richards 2002-07-23 13:17:12 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1a+) Gecko/20020715

Description of problem:
On very fast systems that are using fiber network cards what is happening is
that the startup scripts are running so fast that the fiber optic card has not
had time to connect to the network yet.  Fiber cards seem to take another second
or two to negotiate.  What happens is that the systems are dropping through the
point where NFS mounts are performed before the card is ready and the mounts
aren't performed.  It's almost like the system perceives a network doesn't exist
because the card isn't quite ready.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Install fiber optic NIC card in fast system (Dual 1.0 in this case)
2. Put in NFS mounts into /etc/fstab
3. Reboot system, wait for mounts to come up - They Don't ever.
	

Actual Results:  The system comes up, the network finally starts, but the NFS
mounts have not bee performed.

Expected Results:  The NFS startup scripts should understand that eth0 exists,
and if the network is not up yet, wait a few seconds.

Additional info:

Comment 1 Bill Nottingham 2002-07-23 16:48:45 UTC

This is not really an initscripts issue; the initscripts check for a presence of
an address before they continue. What driver are you using?

Comment 2 David Richards 2002-07-23 16:58:25 UTC

This same issue has come up on all of our servers that we have migrated to fiber
optic network cards.  Systems that run on 100Mb copper cards are working fine.

We have 2 drivers in use and they are both doing the same thing.

One is the 'acenic' driver.
The other is the 'e1000' driver.

Every other aspect of the network works just fine, and you can mount the NFS
mounts by hand once the system is up with no problems.

Here is an example of a mount that is attempted, via fstab that fails:

oa1:/home       /oa1_home       nfs      exec,dev,suid,rw 1 1
oa1:/a/largo    /home/largo     nfs      exec,dev,suid,rw 1 1

Comment 3 David Richards 2004-09-07 18:48:29 UTC

I'm happy to see some movement on this bug, as it still remains with
us even in RH Ent 3.  No fiber optic cards work correctly with NFS
mounts to remote servers.  

When NFS starts the first time, it needs to sit for another few
seconds while the backbone finds the NIC and the network comes up.  In
our case, the 3com Corebuilder takes a few seconds longer to come up
if you have moved it to another physical port, because it's doing
switching and "remembers" where things are and updates tables.  This
really is a matter of just starting the network, and waiting a few
more seconds.  We ended up adding a few lines in S99local to sleep for
a bit and then do a nfs restart.

Comment 4 John W. Linville 2004-09-09 19:06:58 UTC

I'm not inclined to just put-in a sleep that most people won't need --
why slow-down the boot any more?

David, does ethtool show the NIC's link as down (i.e. "Link detected:
no") while the network is unavailable?  If not, is there anything else
one might key off to know when it is safe to continue?

Bill, could we have a variable in ifcfg-* that ifup-post might key off
to put in a user configurable delay?  Just an idea...

Comment 5 Bill Nottingham 2004-09-09 19:11:42 UTC

Hm, well, it would be a magic parameter, which is somewhat icky.

Comment 6 Bill Nottingham 2004-09-09 19:57:11 UTC

Are you using static IPs?

Comment 7 David Richards 2004-09-10 13:15:01 UTC

To answer your questions, IPs are all static.  When NFS hits in the
first pass it says "No Route To Host" on the console.  I added the
following piece in the front of K20nfs in /etc/rc5.d   That should
show the status at boot time.  Reboot for that server is scheduled for
mid of next week and I'll report it then.

echo "----------------------" >> /tmp/redhat.txt
date >> /tmp/redhat.txt
ethtool eth0 >> /tmp/redhat.txt
echo "----------------------" >> /tmp/redhat.txt

Comment 8 Bill Nottingham 2004-09-10 16:25:10 UTC

Yeah, static interfaces are more-or-less defined to have a behavior of
not waiting for a link.

Comment 9 David Richards 2004-09-14 20:41:38 UTC

Bad news, RedHat thinks the card is up and running @ boot time.  The
media seems to be up to the OS, but it's not yet live.  Here is what
we got right before NFS tried to start. :(

Fri Sep 10 09:13:10 EDT 2004
Settings for eth0:
        Supported ports: [ FIBRE ]
        Supported link modes:   1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  1000baseT/Full 
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: FIBRE
        PHYAD: 0
        Transceiver: externel
        Auto-negotiation: off
        Supports Wake-on: d
        Wake-on: d
        Link detected: yes

Comment 10 Bill Nottingham 2004-09-14 20:56:42 UTC

In that case, I'm not sure how to handle it - there's not a good way
to code 'assume this particular card isn't up when it says it is.'

Comment 11 John W. Linville 2004-09-17 18:56:06 UTC

Any chance this card is plugged-in to a switch running spanning-tree
(i.e. STP)?

Bug 110036 and bug 131475 show the need for a generic post link-up
delay in the ifup scripts...

Comment 12 Bill Nottingham 2004-09-17 19:14:35 UTC

The ifup scripts are completely unrelated to kickstart.

Comment 13 John W. Linville 2004-09-17 19:28:29 UTC

I think that misses the point.  Besides, bug 110036 specifically
mentions NFS mounts failing because the ports are not yet usable due
to STP.

The point is that there are common (even legitimate) networking
configurations that can cause a mere check for link-up to be
insufficient to warrant moving forward with the boot process.  Being
connected directly to a port running STP is one of them.

I think it behooves RH to provide a generic, documented, configurable
means of delaying the boot process on a port-by-port basis.

Comment 14 David Richards 2004-09-22 13:50:01 UTC

I am in favor of the discussed idea of adding the number of seconds to
wait into a file.  Configure it to zero by default, and 99% of people
won't know about it, or use it.  But those of us that need to wait a
bit for those cards to light, can make use of it.  I mentioned
previously, that it's not just this one fiber card, it's all gigabit
fiber cards.  All of them do the same thing with a 3Com Core Builder.
 And I also mentioned, that if you move a server to another physical
port, that it takes even longer for the network to come up.  The 3Com
seems to add it to some internal routing tables when you move it, and
this takes a few seconds to do.

Comment 15 Bugzilla owner 2004-09-30 15:39:46 UTC

Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.