Bug 75680
Summary: | Intel e1000 driver on 2.4.18-14smp (kernel panic) | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Marius Hjelle <marius.hjelle> | ||||
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 8.0 | CC: | scott.feldman, signal, woodard | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | athlon | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2004-09-30 15:40:02 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Marius Hjelle
2002-10-10 23:42:27 UTC
I assume you're not running the iANS stuff ? Yes, that is right Both the redhat 8.0 and 4.3.15 drivers have the same problem: if we get called to run our tx_timeout routine to reset the card, we'll panic because of a bug I introduced [BUG() running msec_delay in_interrupt during tx_timeout timer call back!] In a couple of weeks, we'll have an updated driver on the Intel web site that has a fix for this; the fix has already been applied to the 2.4 and 2.5 kernel drivers. The real question is: why are we getting into tx_timeout???? There is a known hang with 82543 when using RxIntDelay, but that's turned off in these drivers. We shouldn't be in tx_timeout. Assigned to arjan, for integrating fixed e1000 into rawhide/8.0 errata. And added CC to Scott Feldman @ Intel in case he wants to pursue further issue of why tx_timeouts are occurring in the first place. Created attachment 82127 [details]
trace
This last attatchment is from running the latest kernel release (2.4.18- 17.8.0smp). Reverting to the Intel 4.2.17 driver keeps the system from crashing. We are having the same problem here at LLNL. Here is another report of the same problem. Might yeild some useful information: From: Jim Garlick <garlick> To: bwoodard Subject: bug report - e1000 driver Date: Tue, 12 Nov 2002 09:26:56 -0800 (PST) Ben - We've been seeing a BUG() triggered in the e1000 driver. The call chain is: e1000_tx_timeout -> e1000_down -> e1000_reset -> e1000_reset_hw -> msec_delay -> BUG() Under heavy load, this is occasionally triggered and crashes the node. The attached patch works around the problem by spinning with interrupts off for longer than probably is sociable, but not long enough to trigger an NMI watchdog at least (that was enabled during our testing). It also may mask other problems that really should trigger a BUG(). Ultimately I think a better fix is needed... Could you report this to RH? Thanks, Jim ---------------------- RCS file: /chaos/cvs/kernel-rh/linux/drivers/net/e1000/Attic/e1000_osdep.h,v retrieving revision 1.1.4.1 retrieving revision 1.1.4.3 diff -u -r1.1.4.1 -r1.1.4.3 --- e1000_osdep.h 29 Oct 2002 00:34:34 -0000 1.1.4.1 +++ e1000_osdep.h 12 Nov 2002 00:54:22 -0000 1.1.4.3 @@ -88,8 +88,8 @@ #define usec_delay(x) udelay(x) #ifndef msec_delay #define msec_delay(x) do { if(in_interrupt()) { \ - /* Don't mdelay in interrupt context! */ \ - BUG(); \ + int i; \ + for (i = 0; i < (x); i++) udelay(1000); \ } else { \ set_current_state(TASK_UNINTERRUPTIBLE); \ schedule_timeout((x * HZ)/1000); \ Arjan can you also do a 7.x errata kernel for this one? If not could you please tell me when this hits rawhide so that I can grab the changes and merge them with the kernel that we have here? I'll see how effectively we can reproduce the the problem here. Hopefully we can provide Scott with an easy way to manifest the problem. The errata kernel needs to be updated to use the 4.4.12-k1 driver from the 2.4.20-rc2 kernel. This driver has the fix for this bug. The files in drivers/net/e1000 should be a drop in replacement for the previous driver. The 4.4.12 driver is also available from Intel's support web site. Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |