Bug 436079 - Kernel panic in sched.c when running Java tests
Kernel panic in sched.c when running Java tests
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel (Show other bugs)
beta
x86_64 All
low Severity urgent
: ---
: ---
Assigned To: Red Hat Real Time Maintenance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-03-05 04:40 EST by IBM Bug Proxy
Modified: 2008-04-23 16:13 EDT (History)
0 users

See Also:
Fixed In Version: 2.6.24.3-29.el5rt
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-04-23 16:13:26 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Screenshot of rtj-opt6.hursley.ibm.com panicking (698.53 KB, image/jpeg)
2008-03-12 13:41 EDT, IBM Bug Proxy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 42758 None None None Never

  None (edit)
Description IBM Bug Proxy 2008-03-05 04:40:30 EST
=Comment: #0=================================================
P. N. Stanton <pstanton@uk.ibm.com> - 2008-02-28 10:28 EDT
Problem description:
Kernel panic in sched.c when running Java tests on RHEL 5.1 RT with kernel
2.6.24.1-24ibmrt. The type of Java test doesn't seem to matter - the machine
will hang within minutes (sometimes less than a minute) of starting java. This
problem can be reproduced easily.
We've seen this on two types of machine:
rtj-opt6.hursley.ibm.com:
eServer 326m, model number 7969-76G
2 x 2.4 GHz Opteron 280 (dual core), 5 GB RAM

rtj-opt22.hursley.ibm.com:
eServer x3455, model number 7984-52G
2 x 2.6 GHz Opteron 2218 (dual core), 10 GB RAM

The output below was captured via a null modem cable, since the message were not
written to disk or to the network (using syslog to a remote syslog host)

rtj-opt6:

kernel BUG at kernel/sched.c:818!
invalid opcode: 0000 [1] PREEMPT SMP 

CPU 3 

Modules linked in: autofs4 hidp nfs lockd nfs_acl rfcomm l2cap bluetooth sunrpc
ipv6 dm_multipath video output sbs sbshc dock battery ac parport_pc lp parport
sg ide_cd cdrom tg3 pata_serverworks button pata_acpi serio_raw ata_generic
rtc_cmos rtc_core rtc_lib pcspkr shpchp k8temp hwmon dm_snapshot dm_zero
dm_mirror dm_mod sata_svw libata mptspi mptscsih scsi_transport_spi mptbase
sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd ssb uhci_hcd

Pid: 4497, comm: java Not tainted 2.6.24.1-24ibmrt #1
RIP: 0010:[<ffffffff8022e807>]  [<ffffffff8022e807>] resched_task+0x24/0x5e
RSP: 0018:ffff81014d859e98  EFLAGS: 00210002
RAX: 0000000000000001 RBX: ffff81014a3b50c0 RCX: ffff81014b948000
RDX: ffffffff8062c100 RSI: 00000000000000bf RDI: ffff81014a3b50c0
RBP: ffff81014d859e98 R08: 0000000000000004 R09: 000000000000003c
R10: 0000000100000002 R11: 0000000000000000 R12: ffff8100c0070480
R13: ffff8100c0070480 R14: 0000000000000001 R15: 000000000000002e
FS:  0000000040021940(0000) GS:ffff81015fb3e6c0(0063) knlGS:00000000b6928b90
CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
CR2: 00000000cbf0c00c CR3: 00000000b043d000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Process java (pid: 4497, threadinfo ffff81014d858000, task ffff81013f95c140)

Stack:  ffff81014d859eb8 ffffffff8022fed5 0000000000000001 ffff81014a3b50c0
 ffff81014d859f28 ffffffff80234d82 ffff81014d859f28 ffff81014d859f38
 ffff81014a3b57d0 00000000a47a4d9c ffffffff804b59e0 ffff81014d859f08

Call Trace:
 [<ffffffff8022fed5>] prio_changed_rt+0x41/0x46
 [<ffffffff80234d82>] sched_setscheduler+0x2e8/0x349
 [<ffffffff80234e4b>] do_sched_setscheduler+0x68/0x7e
 [<ffffffff80234e88>] sys_sched_setscheduler+0x14/0x18
 [<ffffffff802269f4>] cstar_do_call+0x1b/0x65


Code: 0f 0b eb fe 8b 41 10 a8 08 75 2d f0 0f ba 69 10 03 48 8b 47 

RIP  [<ffffffff8022e807>] resched_task+0x24/0x5e
 RSP <ffff81014d859e98>

Kernel panic - not syncing: Fatal exception

Pid: 4497, comm: java Tainted: G      D  2.6.24.1-24ibmrt #1

Call Trace:

 [<ffffffff8023a3fc>] panic+0xaf/0x169
 [<ffffffff8022df6d>] __wake_up_common+0x41/0x74
 [<ffffffff8023727d>] __wake_up+0x3a/0x5b
 [<ffffffff8023a7af>] wake_up_klogd+0x32/0x34
 [<ffffffff8049e1eb>] oops_end+0x54/0x5d
 [<ffffffff8020d997>] die+0x4a/0x54
 [<ffffffff8049e5ff>] do_trap+0x101/0x110
 [<ffffffff8020de9c>] do_invalid_op+0x93/0x9c
 [<ffffffff8022e807>] resched_task+0x24/0x5e
 [<ffffffff80311c12>] avc_has_perm_noaudit+0x3cd/0x48f
 [<ffffffff8049de89>] error_exit+0x0/0x51
 [<ffffffff8022e807>] resched_task+0x24/0x5e
 [<ffffffff8022fed5>] prio_changed_rt+0x41/0x46
 [<ffffffff80234d82>] sched_setscheduler+0x2e8/0x349
 [<ffffffff80234e4b>] do_sched_setscheduler+0x68/0x7e
 [<ffffffff80234e88>] sys_sched_setscheduler+0x14/0x18
 [<ffffffff802269f4>] cstar_do_call+0x1b/0x65



rtj-opt22:

kernel BUG at kernel/sched.c:818!
invalid opcode: 0000 [1] PREEMPT SMP

CPU 1

Modules linked in: autofs4 hidp nfs lockd nfs_acl rfcomm l2cap bluetooth sunrpc
ipv6 dm_multipath video output sbs sbshc dock battery ac parport_pc lp parport
sg tg3 button serio_raw shpchp pata_acpi pata_serverworks ata_generic pcspkr
rtc_cmos rtc_core rtc_lib k8temp hwmon dm_snapshot dm_zero dm_mirror dm_mod
sata_svw libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd ssb uhci_hcd

Pid: 4675, comm: java Not tainted 2.6.24.1-24ibmrt #1

RIP: 0010:[<ffffffff8022e807>]  [<ffffffff8022e807>] resched_task+0x24/0x5e
RSP: 0018:ffff8102a0f1da98  EFLAGS: 00210002
RAX: 0000000000000001 RBX: ffff810297f29700 RCX: ffff81029b5c6000
RDX: ffff8102af0e0740 RSI: 00000000000000bf RDI: ffff810297f29700
RBP: ffff8102a0f1da98 R08: 0000000000000004 R09: 000000000000003c
R10: 0000000100000003 R11: 0000000000000000 R12: ffff8101700be480
R13: 0000000000000001 R14: 0000000000000036 R15: ffffffff804b59e0
FS:  00002aca97b49c80(0000) GS:ffff81016fc23cc0(0063) knlGS:00000000bf66db90
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 00000000cbae4010 CR3: 00000002ac8d1000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Process java (pid: 4675, threadinfo ffff8102a0f1c000, task ffff8102a0f0e480)

Stack:  ffff8102a0f1dab8 ffffffff8022fed5 ffff810297f29700 ffff8101700be480
 ffff8102a0f1db08 ffffffff802370f3 0000002e00000001 ffff8102a0f0e480
 0000000000200097 ffff810297f29700 ffff810297f29e10 ffff8102a0f1dbb8

Call Trace:

 [<ffffffff8022fed5>] prio_changed_rt+0x41/0x46
 [<ffffffff802370f3>] task_setprio+0x178/0x1a0
 [<ffffffff80259cac>] __rt_mutex_adjust_prio+0x20/0x24
 [<ffffffff8025a465>] task_blocks_on_rt_mutex+0x15b/0x1bf
 [<ffffffff8049ce7a>] rt_mutex_slowlock+0x184/0x29d
 [<ffffffff8049cb56>] rt_mutex_lock+0x28/0x2a
 [<ffffffff8025a695>] __rt_down_read+0x47/0x4b
 [<ffffffff8025a6af>] rt_down_read+0xb/0xd
 [<ffffffff80257e1e>] futex_lock_pi+0x103/0x9a8
 [<ffffffff80259cac>] __rt_mutex_adjust_prio+0x20/0x24
 [<ffffffff80259e30>] rt_mutex_adjust_prio+0x35/0x3d
 [<ffffffff80231993>] default_wake_function+0x0/0x14
 [<ffffffff80259197>] do_futex+0xad4/0xb1d
 [<ffffffff8020a8e5>] __switch_to+0x291/0x2a0
 [<ffffffff80234f38>] finish_task_switch+0x45/0x98
 [<ffffffff80259714>] compat_sys_futex+0xd8/0xf6
 [<ffffffff8020f65b>] syscall_trace_enter+0x95/0x99
 [<ffffffff802269f4>] cstar_do_call+0x1b/0x65


Code: 0f 0b eb fe 8b 41 10 a8 08 75 2d f0 0f ba 69 10 03 48 8b 47

RIP  [<ffffffff8022e807>] resched_task+0x24/0x5e
 RSP <ffff8102a0f1da98>



Access to both machines can be provided if needed.
=Comment: #1=================================================
Sripathi Kodi <sripathi@in.ibm.com> - 2008-02-28 12:34 EDT
static void resched_task(struct task_struct *p)
{
        int cpu;

        assert_spin_locked(&task_rq(p)->lock);  <==== Panic happens here

        if (unlikely(test_tsk_thread_flag(p, TIF_NEED_RESCHED)))
                return;

=Comment: #2=================================================
Sripathi Kodi <sripathi@in.ibm.com> - 2008-02-29 01:10 EDT
The probelm is because we somehow reach resched_task() without having held the
rq lock on which the  task is. I have looked at the two code paths leading to
the crash, one starting from sys_sched_setscheduler and the other through
task_blocks_on_rt_mutex. On both these paths, there doesn't seem to be any way
of that happening. Also, rq->lock is a raw_spinlock_t, so nothing to suspect there.

I ran calibrate and started running recalibrate on a machine here. So far I have
not seen the problem in more than 15 minutes.
=Comment: #3=================================================
Sripathi Kodi <sripathi@in.ibm.com> - 2008-02-29 01:23 EDT
This shouldn't be hardware dependent, but I noticed that both the machines
mentioned in the report are non-blades (326m and x3455). Just to be sure, I am
abat-provisioning rt-birch, which is our 2xdual-core e326m. I will see if Java
behaves differently on that.
=Comment: #4=================================================
Chirag H. Jog1 <chirag.jog@in.ibm.com> - 2008-02-29 01:33 EDT
I am running specjbb [in a loop] on a x3455 to be reproduce the issue.
=Comment: #5=================================================
Sripathi Kodi <sripathi@in.ibm.com> - 2008-02-29 03:06 EDT
I have run calibrate, recalibrate and specJBB on an e326m, but did not see the
problem even after a couple of hours. I am using 2.6.24.1-24ibmrt kernel and the
java version used is:
java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build pxi32rt23-20080118 (SR3))
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux x86-32 j9vmxi3223-20071005
(JIT enabled)
J9VM - 20071004_14218_lHdSMR
JIT  - 20070820_1846ifx1_r8
GC   - 200708_10)
JCL  - 20080117a

Peter, we will have to understand how your testing is different from ours. Could
you please:
1) Let us know how the OS was installed. Did you install SR3 and then put
2.6.24.1-24ibmrt kernel on top of it?
2) Is the problem seen only when a particular version of Java is used?
3) Which specific test triggers the problem?
4) Could you provide us access to the failing systems? Do they have a serial
console? We may want to get a kdump from it when the problem happens.
=Comment: #6=================================================
Chirag H. Jog1 <chirag.jog@in.ibm.com> - 2008-02-29 03:25 EDT
10 runs of Specjbb couldn't reproduce the problem on a x3455.

=Comment: #11=================================================
Chirag H. Jog1 <chirag.jog@in.ibm.com> - 2008-02-29 09:07 EDT
I can easily reproduce this on rt-ash.
Next Step:Trying to trigger a dump.
=Comment: #15=================================================
Sripathi Kodi <sripathi@in.ibm.com> - 2008-03-03 02:05 EDT
I can easily recreate the kernel panic problem on rt-birch and rt-ash, whereas I
have not been able to recreate it on an LS20. The only difference between these
machines, apart from the hardware, is that rt-birch and rt-ash were provisioned
through ABAT, whereas the LS20 wasn't.
=Comment: #16=================================================
Sripathi Kodi <sripathi@in.ibm.com> - 2008-03-03 03:29 EDT
Saw this on LS20 (not on ABAT) as well, Hence it doesn't depend on hardware.
=Comment: #17=================================================
Sripathi Kodi <sripathi@in.ibm.com> - 2008-03-03 06:02 EDT
This time I recreated it again on LS20 and got a kdump. Will report findings soon.

I was using stock 2.6.24-rt1 kernel this time, so this is not due to something
that RH put in their kernel.

=Comment: #19=================================================
Sripathi Kodi <sripathi@in.ibm.com> - 2008-03-03 12:10 EDT
I have recreated the problem with the following backtrace in the last couple of
attempts:
kernel BUG at kernel/sched.c:824!
invalid opcode: 0000 [1] PREEMPT SMP 
CPU 3 
<snip>
Call Trace:
 [<ffffffff8022fea5>] prio_changed_rt+0x41/0x46
 [<ffffffff802370de>] task_setprio+0x178/0x1a0
 [<ffffffff80259b8c>] __rt_mutex_adjust_prio+0x20/0x24
 [<ffffffff8025a345>] task_blocks_on_rt_mutex+0x15b/0x1bf
 [<ffffffff8049ca3a>] rt_mutex_slowlock+0x184/0x29d
 [<ffffffff8049c716>] rt_mutex_lock+0x28/0x2a
 [<ffffffff8025a577>] __rt_down_read+0x47/0x4b
 [<ffffffff8025a591>] rt_down_read+0xb/0xd
 [<ffffffff8025829b>] futex_lock_pi+0x6a0/0x9a8
 [<ffffffff802514a3>] lock_hrtimer_base+0x25/0x4b
 [<ffffffff8025156b>] hrtimer_try_to_cancel+0x64/0x6f
 [<ffffffff80259077>] do_futex+0xad4/0xb1d
 [<ffffffff8025171a>] hrtimer_nanosleep+0x5c/0xfa
 [<ffffffff8022fc6b>] pick_next_highest_task_rt+0x121/0x157
 [<ffffffff80251042>] hrtimer_wakeup+0x0/0x21
 [<ffffffff802595f4>] compat_sys_futex+0xd8/0xf6
 [<ffffffff8020f65b>] syscall_trace_enter+0x95/0x99
 [<ffffffff802269f4>] cstar_do_call+0x1b/0x65

I will attach the complete oops message to this bug.

Looking at this stack trace, we call task_rq_lock() in task_setprio() to lock
the runqueue. The code in task_rq_lock() locks the rq->lock as well as disables
irqs. I can't see anything wrong with this.

Though I can't figure how, I suspected that the task is being moved between
runqueues after it holds the runqueue lock. That would explain the problem we
are seeing. To verify this, I added a variable to track the cpu number on which
we hold the runqueue lock. Later when the crash happens, I can verify using
kdump whether we are on the same cpu where we took the lock. I recreated the
problem with this patch and looked at the kdump. I saw that the panic happened
on cpu3 and the runqueue lock was also held on cpu3. So it looks clean.

I then looked at the runqueues:
crash> runq
RUNQUEUES[0]: ffff810001011480
 RT PRIO_ARRAY: ffff810001011530
 CFS RB_ROOT: ffff810001011508
RUNQUEUES[1]: ffff810001021480
 RT PRIO_ARRAY: ffff810001021530
 CFS RB_ROOT: ffff810001021508
RUNQUEUES[2]: ffff81011209e480
 RT PRIO_ARRAY: ffff81011209e530
 CFS RB_ROOT: ffff81011209e508
RUNQUEUES[3]: ffff8101120ae480
 RT PRIO_ARRAY: ffff8101120ae530
 CFS RB_ROOT: ffff8101120ae508

Going through each of them, I see that only runqueue 2 is locked, rest are not.
Runqueue 3 should have been locked!

crash> struct rq 0xffff810001011480
struct rq {
  lock = {
    raw_lock = {
      slock = 1
    }, 
    break_lock = 0
<snip>
crash> struct rq 0xffff810001021480
struct rq {
  lock = {
    raw_lock = {
      slock = 4294967295
    }, 
<snip>
crash> struct rq 0xffff81011209e480
struct rq {
  lock = {
    raw_lock = {
      slock = 1
    }, 
<snip>
crash> struct rq 0xffff8101120ae480
struct rq {
  lock = {
    raw_lock = {
      slock = 1
    }, 

I am going to try to track the pids that lock/unlock rqs to gain more insight
into this problem.
=Comment: #20=================================================
Sripathi Kodi <sripathi@in.ibm.com> - 2008-03-05 03:49 EDT
I reported this problem on linux-rt-users. Steven Rostedt sent a patch that
seems to fix the problem. The problem was because we briefly unlock the runqueue
lock of the target task, during which it can move from a runqueue to another.
The patch is:

Signed-off-by: Steven Rostedt <srostedt@redhat.com>

---
 kernel/sched_rt.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6.24.3-rt3/kernel/sched_rt.c
===================================================================
--- linux-2.6.24.3-rt3.orig/kernel/sched_rt.c   2008-03-04 13:49:53.000000000 -0500
+++ linux-2.6.24.3-rt3/kernel/sched_rt.c        2008-03-04 13:51:27.000000000 -0500
@@ -840,9 +840,11 @@ static void prio_changed_rt(struct rq *r
                        pull_rt_task(rq);
                /*
                 * If there's a higher priority task waiting to run
-                * then reschedule.
+                * then reschedule. Note, the above pull_rt_task
+                * can release the rq lock and p could migrate.
+                * Only reschedule if p is still on the same runqueue.
                 */
-               if (p->prio > rq->rt.highest_prio)
+               if (p->prio > rq->rt.highest_prio && task_rq(p) == rq)
                        resched_task(p);
 #else
                /* For UP simply resched on drop of prio */
=Comment: #21=================================================
Sripathi Kodi <sripathi@in.ibm.com> - 2008-03-05 03:51 EDT
I will get this bug mirrored to RH.
Comment 1 IBM Bug Proxy 2008-03-06 15:56:34 EST
------- Comment From jstultz@us.ibm.com 2008-03-06 15:50 EDT-------
(In reply to comment #10)
> Here is the link provided by Andrew to download the entire jltf and to run the
> perl script.
>
> http://javasvt.hursley.ibm.com/jtc-tests-package-20080222_03.zip

The URL above is giving 404s. Where can I find the package needed to reproduce this?
Comment 2 IBM Bug Proxy 2008-03-06 18:24:41 EST
------- Comment From jstultz@us.ibm.com 2008-03-06 18:21 EDT-------
I reproduced this issue w/ the older 2.6.24.1-24 kernel, then I tried with
today's kernel-rt-2.6.24.3-29.el5rt kernel and could not reproduce the issue.
Comment 3 IBM Bug Proxy 2008-03-06 21:40:34 EST
------- Comment From jstultz@us.ibm.com 2008-03-06 21:32 EDT-------
Marking this Fixed Awaiting Test, and waiting for JTC to confirm.
Comment 4 IBM Bug Proxy 2008-03-07 09:57:32 EST
------- Comment From paul_thwaite@uk.ibm.com 2008-03-07 09:49 EDT-------
Thanks for the quick turnaround of the kernel build.

Unfortunately we're unable to install the new kernel until Monday (due to
staff vacation).  Results should hopefully be available by COB Monday.
Comment 5 IBM Bug Proxy 2008-03-07 14:16:45 EST
------- Comment From paul_thwaite@uk.ibm.com 2008-03-07 14:14 EDT-------
Two machines have been installed with the new kernel (thanks Peter!) and tests
are running at the moment.

Results of the tests will be available on Monday.
Comment 6 IBM Bug Proxy 2008-03-10 10:48:48 EDT
------- Comment From paul_thwaite@uk.ibm.com 2008-03-10 10:41 EDT-------
Testing on kernel level "2.6.24.3-29.el5rt" no longer causes a kernel panic.

This defect has been fixed.  Thank you.

However, although we do not get a kernel panic, we still see 42841 (a hang)
which usually kills the machine and needs a reboot.

So 42841 is a blocker. I'll update that bug now.
Comment 7 IBM Bug Proxy 2008-03-11 19:00:33 EDT
------- Comment From jstultz@us.ibm.com 2008-03-11 18:58 EDT-------
Fix for this is now merged in our R2 kernel.
Comment 8 IBM Bug Proxy 2008-03-12 12:16:35 EDT
------- Comment From pstanton@uk.ibm.com 2008-03-12 12:15 EDT-------
Seems that the machines are not hanging as per 48241 - they're actually
panicking. RIP is pointing to __spin_lock this time. As before, this is easily
reproducible with the same tests that found the first panic.
We didn't spot this at first as the panics are not output over the network or
serial link - it only appears on the screen. So unfortunately the only output we
have is what I've been able to photograph from the screen.
Do you want me to re-open this bug or start a new one?
Comment 9 IBM Bug Proxy 2008-03-12 12:40:32 EDT
------- Comment From sripathi@in.ibm.com 2008-03-12 12:37 EDT-------
(In reply to comment #39)
> Seems that the machines are not hanging as per 48241 - they're actually
> panicking. RIP is pointing to __spin_lock this time. As before, this is easily
> reproducible with the same tests that found the first panic.
> We didn't spot this at first as the panics are not output over the network or
> serial link - it only appears on the screen. So unfortunately the only output we
> have is what I've been able to photograph from the screen.
> Do you want me to re-open this bug or start a new one?

Is it possible that all hangs seen in 42841 could be explained by this? If that
is possible, it will be useful  to carry out analysis in 42841. If it turns out
to be a totally new problem, we can open a new bug later. We surely like to see
the photograph, btw.
Comment 10 IBM Bug Proxy 2008-03-12 13:40:55 EDT
------- Comment From pstanton@uk.ibm.com 2008-03-12 13:34 EDT-------
It could be the same hang as in 42841, but I see kernel messages every time
whereas the comments in 42841 say that no output is produced.
I've attached a photo from rtj-opt6.hursley.ibm.com. Some of the information
looks to have disappeared off the top of the screen, but this is the only one
I've seen that shows a call trace. Apologies for the quality of the photography
- took this with my mobile phone
Comment 11 IBM Bug Proxy 2008-03-12 13:41:00 EDT
Created attachment 297816 [details]
Screenshot of rtj-opt6.hursley.ibm.com panicking
Comment 12 IBM Bug Proxy 2008-03-13 01:40:46 EDT
------- Comment From sripathi@in.ibm.com 2008-03-13 01:33 EDT-------
The screenshot is interesting. The panic seems to be because of an nmi. My guess
is that the system was hung and nmi triggered the panic. This is what I have
been trying to do without success! I still believe we are seeing bug 42841 here,
not the original panic seen with this bug. Hence I will move the discussion and
the screenshot to bug 42841.
Comment 13 Clark Williams 2008-04-08 18:31:35 EDT
Where are we with this bug?
Comment 14 IBM Bug Proxy 2008-04-08 23:41:15 EDT
------- Comment From sripathi@in.ibm.com 2008-04-08 23:36 EDT-------
(In reply to comment #44)
> ------- Comment From williams@redhat.com 2008-04-08 18:31 EST-------
> Where are we with this bug?

Hi Clark,

The fix for this problem has been in MRG kernel since 2.6.24.3-29.el5rt kernel.
Please see comment #6. However, I agree that we did not make it very clear.
Sorry about that.

So, the bug is fixed and can be closed.

Note You need to log in before you can comment on or make changes to this bug.