Bug 142976 - RHEL 3.0 v2.4.21-20.0.1.EL kernel panics after several hours of running raw I/O
Summary: RHEL 3.0 v2.4.21-20.0.1.EL kernel panics after several hours of running raw I/O
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-12-15 15:44 UTC by Heather Conway
Modified: 2007-11-30 22:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-10-19 19:11:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
trace of Oops without PowerPath running on the system (26.50 KB, text/plain)
2004-12-15 15:46 UTC, Heather Conway
no flags Details
trace of the Oops that occurred with PowerPath (34.00 KB, text/plain)
2004-12-15 15:49 UTC, Heather Conway
no flags Details
Oops with no PP in text format (2.36 KB, text/plain)
2005-01-28 20:29 UTC, Heather Conway
no flags Details
Oops with PP in text format (6.04 KB, text/plain)
2005-01-28 20:32 UTC, Heather Conway
no flags Details

Description Heather Conway 2004-12-15 15:44:42 UTC
Description of problem:
On an Opteron based system, the RHEL 3.0 v2.4.21-20.0.1.EL kernel 
panics after several hours of running raw I/O.  This panic occurs 
both with and without PowerPath.  Per the PowerPath team:
One of the dd processes issues an IO and calls kiobuf_wait_for_io(), 
which in turn calls schedule(). In schedule(), the kernel attempts to 
perform a context switch, an panics because the task struct pointer 
passed in for the new process is NULL. This points to kernel memory 
corruption, more specifically corruption of CPU runqueueus.

Version-Release number of selected component (if applicable):
kernel-source-2.4.21-20.0.1.EL x86_64

How reproducible:
Run raw I/O for serveral hours on an AMD64 Opteron-based system

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Heather Conway 2004-12-15 15:46:01 UTC
Created attachment 108621 [details]
trace of Oops without PowerPath running on the system

Enclosing a trace of the Oops that occurred without PowerPath running on the
system.

Comment 2 Heather Conway 2004-12-15 15:49:23 UTC
Created attachment 108623 [details]
trace of the Oops that occurred with PowerPath 

Enclosing a trace of the Oops that occurred without PowerPath running on the
system.

Comment 3 Heather Conway 2004-12-15 15:49:54 UTC
I neglected to mention that this is PowerPath v4.3.1.

Comment 4 Ernie Petrides 2004-12-15 23:04:07 UTC
Heather, please attach a trace (oops output) from a non-tainted kernel.
Also, please do not attach Microsoft Word documents in the future.


Comment 5 Heather Conway 2005-01-28 14:32:02 UTC
I have not been able to replicate this and have not received any 
feedback from the PowerPath team so I am considering this issue as 
NOTABUG.

Comment 6 Heather Conway 2005-01-28 20:29:30 UTC
Created attachment 110369 [details]
Oops with no PP in text format

Comment 7 Heather Conway 2005-01-28 20:30:39 UTC
oops - I closed the wrong Bugzilla.  The Oops output is being 
attached in text format.

Comment 8 Heather Conway 2005-01-28 20:32:05 UTC
Created attachment 110370 [details]
Oops with PP in text format

Attaching text document of Oops with PowerPath installed.

Comment 9 AJ Johnson 2005-04-30 00:00:52 UTC
I am seeing a very similar issue, but the system doesn't panic.  What is the
status of this bug?

Comment 13 Larry Woodman 2005-09-16 14:36:39 UTC
Is this still a problem with the patest RHEL3-U6 update?  The reason I ask is
that several generic kernel changes and changes to the x86_64 specific code have
been made to RHEL3 since 2.1.21-20.  Can someone please verify that this problem
still occurs with the latest kernel?

Thanks, Larry Woodman


Comment 15 RHEL Program Management 2007-10-19 19:11:13 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.


Note You need to log in before you can comment on or make changes to this bug.