Bug 75680 - Intel e1000 driver on 2.4.18-14smp (kernel panic)
Intel e1000 driver on 2.4.18-14smp (kernel panic)
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
athlon Linux
medium Severity high
: ---
: ---
Assigned To: Arjan van de Ven
Brian Brock
Depends On:
  Show dependency treegraph
Reported: 2002-10-10 19:42 EDT by Marius Hjelle
Modified: 2008-08-01 12:22 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-09-30 11:40:02 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
trace (1.25 MB, image/jpeg)
2002-10-25 16:13 EDT, Marius Hjelle
no flags Details

  None (edit)
Description Marius Hjelle 2002-10-10 19:42:27 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Description of problem:
We have a system with an Intel gigabit network controller PWLA8490T - 
identified by /sbin/lspic as Intel Corp. 82543GC Gigabit Ethernet Controller 
(rev 02). During the last days the system has halted various times. The server 
has been running headless and only today I managed to get some of the trace 
from the kernel panic.

[<c0218c00>] ip_local_deliver_finish [kernel] 0x0 (0xc0c36fe74))
[<f88ca24d>] e1000_reset [e1000] 0x59 (0xc036fe90))
[<f88ca1da>] e1000_down [e1000] 0x5a (0xc36fec0))

Linux detail:
Linux version 2.4.18-14smp (bhcompile@astest.test.redhat.com) (gcc version 3.2 
20020903 (Red Hat Linux 8.0 3.2-7)) #1 SMP Wed Sep 4 11:55:37 E
DT 2002

Uname -a = Linux localhost 2.4.18-14smp #1 SMP Wed Sep 4 11:55:37 EDT 2002 i686 
athlon i386 GNU/Linux

Hardware: this is an Asus A7M266-D motherboard, 2x athlon 2000+, 1 GB ecc ram.

Short history: 
This system have been operational since June this year running RedHat linux 7.3 
with both RedHat provided drivers and later also using the driver provided by 
Intel (e1000-4.2.17). The system ran stable. However we had poor network 

Last Saturday (October 6th) we upgraded redhat to current release the first 
crash was on Tuesday. After another crash on Wednesday I replaced the nic 
driver with the one provided by Intel (e1000-4.3.15). It was first this 
afternoon I managed to see the actual trace and I copied down the first lines 
before I replaced / inserted another nic to make the server operational.

What i really need to know is if this is a problem related to harware or if 
this comes only from the current drivers and / or i  realation to RedHat 8.0.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. set up a system with the above mentioned specifications
2. boot up
3. run some network services (and probably wait some hours)

Actual Results:  system crash

Expected Results:  no system crash

Additional info:
Comment 1 Arjan van de Ven 2002-10-11 09:50:50 EDT
I assume you're not running the iANS stuff ?
Comment 2 Marius Hjelle 2002-10-11 10:36:10 EDT
Yes, that is right
Comment 3 Scott Feldman 2002-10-15 15:56:37 EDT
Both the redhat 8.0 and 4.3.15 drivers have the same problem: if we get called 
to run our tx_timeout routine to reset the card, we'll panic because of a bug 
I introduced [BUG() running msec_delay in_interrupt during tx_timeout timer 
call back!]  In a couple of weeks, we'll have an updated driver on the Intel 
web site that has a fix for this; the fix has already been applied to the 2.4 
and 2.5 kernel drivers.

The real question is: why are we getting into tx_timeout????  There is a known 
hang with 82543 when using RxIntDelay, but that's turned off in these 
drivers.  We shouldn't be in tx_timeout.
Comment 4 Jeff Garzik 2002-10-25 14:12:41 EDT
Assigned to arjan, for integrating fixed e1000 into rawhide/8.0 errata.
And added CC to Scott Feldman @ Intel in case he wants to pursue further issue
of why tx_timeouts are occurring in the first place.
Comment 5 Marius Hjelle 2002-10-25 16:13:27 EDT
Created attachment 82127 [details]
Comment 6 Marius Hjelle 2002-10-25 16:19:20 EDT
This last attatchment is from running the latest kernel release (2.4.18-
17.8.0smp). Reverting to the Intel 4.2.17 driver keeps the system from crashing.
Comment 7 Ben Woodard 2002-11-13 12:17:16 EST
We are having the same problem here at LLNL.
Comment 8 Ben Woodard 2002-11-13 12:20:09 EST
Here is another report of the same problem. Might yeild some useful information:

	From: 	Jim Garlick <garlick@llnl.gov>
To: 	bwoodard@llnl.gov
Subject: 	bug report - e1000 driver
Date: 	Tue, 12 Nov 2002 09:26:56 -0800 (PST)	
Ben -

We've been seeing a BUG() triggered in the e1000 driver.  The call chain is:

  e1000_tx_timeout -> e1000_down -> e1000_reset -> e1000_reset_hw
       -> msec_delay -> BUG()

Under heavy load, this is occasionally triggered and crashes the node.

The attached patch works around the problem by spinning with interrupts
off for longer than probably is sociable, but not long enough to trigger
an NMI watchdog at least (that was enabled during our testing).  It also
may mask other problems that really should trigger a BUG().  Ultimately
I think a better fix is needed...

Could you report this to RH?



RCS file: /chaos/cvs/kernel-rh/linux/drivers/net/e1000/Attic/e1000_osdep.h,v
retrieving revision
retrieving revision
diff -u -r1.1.4.1 -r1.1.4.3
--- e1000_osdep.h       29 Oct 2002 00:34:34 -0000
+++ e1000_osdep.h       12 Nov 2002 00:54:22 -0000
@@ -88,8 +88,8 @@
 #define usec_delay(x) udelay(x)
 #ifndef msec_delay
 #define msec_delay(x)  do { if(in_interrupt()) { \
-                               /* Don't mdelay in interrupt context! */ \
-                               BUG(); \
+                               int i; \
+                               for (i = 0; i < (x); i++) udelay(1000); \
                        } else { \
                                set_current_state(TASK_UNINTERRUPTIBLE); \
                                schedule_timeout((x * HZ)/1000); \
Comment 9 Ben Woodard 2002-11-13 14:00:22 EST
Arjan can you also do a 7.x errata kernel for this one?
If not could you please tell me when this hits rawhide so that I can grab the
changes and merge them with the kernel that we have here?
I'll see how effectively we can reproduce the the problem here. Hopefully we can
provide Scott with an easy way to manifest the problem.
Comment 10 Scott Feldman 2002-11-18 21:14:11 EST
The errata kernel needs to be updated to use the 4.4.12-k1 driver from the 
2.4.20-rc2 kernel.  This driver has the fix for this bug.  The files in 
drivers/net/e1000 should be a drop in replacement for the previous driver.

The 4.4.12 driver is also available from Intel's support web site.
Comment 11 Bugzilla owner 2004-09-30 11:40:02 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.