Description of problem: When using ethernet driver e100/Intel Ethernet Pro 100 chipset network shutsdown but does not bring interface down. Although interface continues to be recognized (visible in ifconfig, processes such as NFS, if in use at time of failure, do not recognize failure), new processes after failure cannot access network: bind, nfs and autofs fail, machine cannot ping or resolve host names. Cannot produce conditions to reproduce but does occur only on machines upgraded to Redhat 9. Network failure resolved with a restart of network init script and a reload of failed services (bind, autofs). Problem ceases with replacement of card but does not require the existing interface to be removed, disabled in the bios, or its existing configuration to be removed; all new configurations are moved to eth1 and eth0 is not brought up by network scripts. Version-Release number of selected component (if applicable): Redhat 9 with 2.4.20-18.9smp kernel How reproducible: Occurance seems to be random and not dependent on network load but we have discovered on several machines with identical hardware. Steps to Reproduce: 1. 2. 3. Actual results: network failure. Expected results: Additional info:
This failure only occurs with Asus CUV4X-DLS motherboards with integrated ethernet. Sorry to say, not all the same machines encounter the same problems, regardless that they are otherwise identical, including bios version.
I'm seeing a similar type of failure on a server recently upgraded from rh62. In my case, it's a Supermicro board, with plain-jane e100 pci adapter, rh9, kernel-2.4.20-28.9(i686), samba/nfs server (with decent, but not astronomical usage/load). Server will be running along nicely, sometimes for hours or even days, and suddenly eth0 (appears to?) drop *all* packets. No tcp, pings, nada (though I still see link-lights on the card) ifdown eth0; rmmod e100; ifup eth0 seems to restore normal operation. During failure, nothing seems to be logged to /var/logs/messages. ideas? hints/pointers? I wouldn't have thought upgrading rh62->rh9 would have left me with a *less* reliable server... )-:
I've installed/upgraded-to Intel's e100-2.3.33 driver to see if that helps any.
Did e100-2.3.33 help? There is a newer version of e100 at http://sf.net/projects/e1000, version 3.0.15. I'd like to know if that driver fixes Charles' up/down interface issue and Rex's packet drop issue.
e100-2.3.33 didn't help any (same behavior).
Rex, with 2.3.33, you can dump the nic stats using ethtool -S eth<X>. Would you check to see if stat "rx_tco_packets" is non-zero after the hang? Also, would you attach the output of lspci -n? Thanks.
# lspci -n 00:00.0 Class 0600: 8086:7190 (rev 03) 00:01.0 Class 0604: 8086:7191 (rev 03) 00:07.0 Class 0601: 8086:7110 (rev 02) 00:07.1 Class 0101: 8086:7111 (rev 01) 00:07.2 Class 0c03: 8086:7112 (rev 01) 00:07.3 Class 0680: 8086:7113 (rev 02) 00:0f.0 Class 0100: 1119:011a 00:10.0 Class 0200: 8086:1229 (rev 08) 00:14.0 Class 0104: 1103:0008 (rev 07) 00:14.1 Class 0104: 1103:0008 (rev 07) 01:00.0 Class 0300: 102b:0521 (rev 01)
I'm trying out e100-2.3.38 + kernel-2.4.20-30.9 now. Is it worth trying the developmental e100-3.0.15 driver from sourceforge? Unfortunately, if I'm not at the console within minutes after the hang, the machine is quickly becomes completely unresponsive even on the console. I'll do my best to catch it and to the ethtool -S eth0 when/if it happens again.
Yes, try the e100-3.0.15 driver. It's really our focus right now, so if this driver has a problem on your system, we'd like to fix the problem in that driver. Also, I asked for the lspci dump to see which pro/100 controller you actually had. You have a 82558 part, which is good because it's pretty basic and doesn't have a lot of fancy features. It should NOT stop! Let's see if 3.0.15 likes it better.
Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/