=Comment: #0================================================= P. N. Stanton <pstanton.com> - 2008-02-28 10:28 EDT Problem description: Kernel panic in sched.c when running Java tests on RHEL 5.1 RT with kernel 2.6.24.1-24ibmrt. The type of Java test doesn't seem to matter - the machine will hang within minutes (sometimes less than a minute) of starting java. This problem can be reproduced easily. We've seen this on two types of machine: rtj-opt6.hursley.ibm.com: eServer 326m, model number 7969-76G 2 x 2.4 GHz Opteron 280 (dual core), 5 GB RAM rtj-opt22.hursley.ibm.com: eServer x3455, model number 7984-52G 2 x 2.6 GHz Opteron 2218 (dual core), 10 GB RAM The output below was captured via a null modem cable, since the message were not written to disk or to the network (using syslog to a remote syslog host) rtj-opt6: kernel BUG at kernel/sched.c:818! invalid opcode: 0000 [1] PREEMPT SMP CPU 3 Modules linked in: autofs4 hidp nfs lockd nfs_acl rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video output sbs sbshc dock battery ac parport_pc lp parport sg ide_cd cdrom tg3 pata_serverworks button pata_acpi serio_raw ata_generic rtc_cmos rtc_core rtc_lib pcspkr shpchp k8temp hwmon dm_snapshot dm_zero dm_mirror dm_mod sata_svw libata mptspi mptscsih scsi_transport_spi mptbase sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd ssb uhci_hcd Pid: 4497, comm: java Not tainted 2.6.24.1-24ibmrt #1 RIP: 0010:[<ffffffff8022e807>] [<ffffffff8022e807>] resched_task+0x24/0x5e RSP: 0018:ffff81014d859e98 EFLAGS: 00210002 RAX: 0000000000000001 RBX: ffff81014a3b50c0 RCX: ffff81014b948000 RDX: ffffffff8062c100 RSI: 00000000000000bf RDI: ffff81014a3b50c0 RBP: ffff81014d859e98 R08: 0000000000000004 R09: 000000000000003c R10: 0000000100000002 R11: 0000000000000000 R12: ffff8100c0070480 R13: ffff8100c0070480 R14: 0000000000000001 R15: 000000000000002e FS: 0000000040021940(0000) GS:ffff81015fb3e6c0(0063) knlGS:00000000b6928b90 CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b CR2: 00000000cbf0c00c CR3: 00000000b043d000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process java (pid: 4497, threadinfo ffff81014d858000, task ffff81013f95c140) Stack: ffff81014d859eb8 ffffffff8022fed5 0000000000000001 ffff81014a3b50c0 ffff81014d859f28 ffffffff80234d82 ffff81014d859f28 ffff81014d859f38 ffff81014a3b57d0 00000000a47a4d9c ffffffff804b59e0 ffff81014d859f08 Call Trace: [<ffffffff8022fed5>] prio_changed_rt+0x41/0x46 [<ffffffff80234d82>] sched_setscheduler+0x2e8/0x349 [<ffffffff80234e4b>] do_sched_setscheduler+0x68/0x7e [<ffffffff80234e88>] sys_sched_setscheduler+0x14/0x18 [<ffffffff802269f4>] cstar_do_call+0x1b/0x65 Code: 0f 0b eb fe 8b 41 10 a8 08 75 2d f0 0f ba 69 10 03 48 8b 47 RIP [<ffffffff8022e807>] resched_task+0x24/0x5e RSP <ffff81014d859e98> Kernel panic - not syncing: Fatal exception Pid: 4497, comm: java Tainted: G D 2.6.24.1-24ibmrt #1 Call Trace: [<ffffffff8023a3fc>] panic+0xaf/0x169 [<ffffffff8022df6d>] __wake_up_common+0x41/0x74 [<ffffffff8023727d>] __wake_up+0x3a/0x5b [<ffffffff8023a7af>] wake_up_klogd+0x32/0x34 [<ffffffff8049e1eb>] oops_end+0x54/0x5d [<ffffffff8020d997>] die+0x4a/0x54 [<ffffffff8049e5ff>] do_trap+0x101/0x110 [<ffffffff8020de9c>] do_invalid_op+0x93/0x9c [<ffffffff8022e807>] resched_task+0x24/0x5e [<ffffffff80311c12>] avc_has_perm_noaudit+0x3cd/0x48f [<ffffffff8049de89>] error_exit+0x0/0x51 [<ffffffff8022e807>] resched_task+0x24/0x5e [<ffffffff8022fed5>] prio_changed_rt+0x41/0x46 [<ffffffff80234d82>] sched_setscheduler+0x2e8/0x349 [<ffffffff80234e4b>] do_sched_setscheduler+0x68/0x7e [<ffffffff80234e88>] sys_sched_setscheduler+0x14/0x18 [<ffffffff802269f4>] cstar_do_call+0x1b/0x65 rtj-opt22: kernel BUG at kernel/sched.c:818! invalid opcode: 0000 [1] PREEMPT SMP CPU 1 Modules linked in: autofs4 hidp nfs lockd nfs_acl rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video output sbs sbshc dock battery ac parport_pc lp parport sg tg3 button serio_raw shpchp pata_acpi pata_serverworks ata_generic pcspkr rtc_cmos rtc_core rtc_lib k8temp hwmon dm_snapshot dm_zero dm_mirror dm_mod sata_svw libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd ssb uhci_hcd Pid: 4675, comm: java Not tainted 2.6.24.1-24ibmrt #1 RIP: 0010:[<ffffffff8022e807>] [<ffffffff8022e807>] resched_task+0x24/0x5e RSP: 0018:ffff8102a0f1da98 EFLAGS: 00210002 RAX: 0000000000000001 RBX: ffff810297f29700 RCX: ffff81029b5c6000 RDX: ffff8102af0e0740 RSI: 00000000000000bf RDI: ffff810297f29700 RBP: ffff8102a0f1da98 R08: 0000000000000004 R09: 000000000000003c R10: 0000000100000003 R11: 0000000000000000 R12: ffff8101700be480 R13: 0000000000000001 R14: 0000000000000036 R15: ffffffff804b59e0 FS: 00002aca97b49c80(0000) GS:ffff81016fc23cc0(0063) knlGS:00000000bf66db90 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 00000000cbae4010 CR3: 00000002ac8d1000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process java (pid: 4675, threadinfo ffff8102a0f1c000, task ffff8102a0f0e480) Stack: ffff8102a0f1dab8 ffffffff8022fed5 ffff810297f29700 ffff8101700be480 ffff8102a0f1db08 ffffffff802370f3 0000002e00000001 ffff8102a0f0e480 0000000000200097 ffff810297f29700 ffff810297f29e10 ffff8102a0f1dbb8 Call Trace: [<ffffffff8022fed5>] prio_changed_rt+0x41/0x46 [<ffffffff802370f3>] task_setprio+0x178/0x1a0 [<ffffffff80259cac>] __rt_mutex_adjust_prio+0x20/0x24 [<ffffffff8025a465>] task_blocks_on_rt_mutex+0x15b/0x1bf [<ffffffff8049ce7a>] rt_mutex_slowlock+0x184/0x29d [<ffffffff8049cb56>] rt_mutex_lock+0x28/0x2a [<ffffffff8025a695>] __rt_down_read+0x47/0x4b [<ffffffff8025a6af>] rt_down_read+0xb/0xd [<ffffffff80257e1e>] futex_lock_pi+0x103/0x9a8 [<ffffffff80259cac>] __rt_mutex_adjust_prio+0x20/0x24 [<ffffffff80259e30>] rt_mutex_adjust_prio+0x35/0x3d [<ffffffff80231993>] default_wake_function+0x0/0x14 [<ffffffff80259197>] do_futex+0xad4/0xb1d [<ffffffff8020a8e5>] __switch_to+0x291/0x2a0 [<ffffffff80234f38>] finish_task_switch+0x45/0x98 [<ffffffff80259714>] compat_sys_futex+0xd8/0xf6 [<ffffffff8020f65b>] syscall_trace_enter+0x95/0x99 [<ffffffff802269f4>] cstar_do_call+0x1b/0x65 Code: 0f 0b eb fe 8b 41 10 a8 08 75 2d f0 0f ba 69 10 03 48 8b 47 RIP [<ffffffff8022e807>] resched_task+0x24/0x5e RSP <ffff8102a0f1da98> Access to both machines can be provided if needed. =Comment: #1================================================= Sripathi Kodi <sripathi.com> - 2008-02-28 12:34 EDT static void resched_task(struct task_struct *p) { int cpu; assert_spin_locked(&task_rq(p)->lock); <==== Panic happens here if (unlikely(test_tsk_thread_flag(p, TIF_NEED_RESCHED))) return; =Comment: #2================================================= Sripathi Kodi <sripathi.com> - 2008-02-29 01:10 EDT The probelm is because we somehow reach resched_task() without having held the rq lock on which the task is. I have looked at the two code paths leading to the crash, one starting from sys_sched_setscheduler and the other through task_blocks_on_rt_mutex. On both these paths, there doesn't seem to be any way of that happening. Also, rq->lock is a raw_spinlock_t, so nothing to suspect there. I ran calibrate and started running recalibrate on a machine here. So far I have not seen the problem in more than 15 minutes. =Comment: #3================================================= Sripathi Kodi <sripathi.com> - 2008-02-29 01:23 EDT This shouldn't be hardware dependent, but I noticed that both the machines mentioned in the report are non-blades (326m and x3455). Just to be sure, I am abat-provisioning rt-birch, which is our 2xdual-core e326m. I will see if Java behaves differently on that. =Comment: #4================================================= Chirag H. Jog1 <chirag.jog.com> - 2008-02-29 01:33 EDT I am running specjbb [in a loop] on a x3455 to be reproduce the issue. =Comment: #5================================================= Sripathi Kodi <sripathi.com> - 2008-02-29 03:06 EDT I have run calibrate, recalibrate and specJBB on an e326m, but did not see the problem even after a couple of hours. I am using 2.6.24.1-24ibmrt kernel and the java version used is: java version "1.5.0" Java(TM) 2 Runtime Environment, Standard Edition (build pxi32rt23-20080118 (SR3)) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux x86-32 j9vmxi3223-20071005 (JIT enabled) J9VM - 20071004_14218_lHdSMR JIT - 20070820_1846ifx1_r8 GC - 200708_10) JCL - 20080117a Peter, we will have to understand how your testing is different from ours. Could you please: 1) Let us know how the OS was installed. Did you install SR3 and then put 2.6.24.1-24ibmrt kernel on top of it? 2) Is the problem seen only when a particular version of Java is used? 3) Which specific test triggers the problem? 4) Could you provide us access to the failing systems? Do they have a serial console? We may want to get a kdump from it when the problem happens. =Comment: #6================================================= Chirag H. Jog1 <chirag.jog.com> - 2008-02-29 03:25 EDT 10 runs of Specjbb couldn't reproduce the problem on a x3455. =Comment: #11================================================= Chirag H. Jog1 <chirag.jog.com> - 2008-02-29 09:07 EDT I can easily reproduce this on rt-ash. Next Step:Trying to trigger a dump. =Comment: #15================================================= Sripathi Kodi <sripathi.com> - 2008-03-03 02:05 EDT I can easily recreate the kernel panic problem on rt-birch and rt-ash, whereas I have not been able to recreate it on an LS20. The only difference between these machines, apart from the hardware, is that rt-birch and rt-ash were provisioned through ABAT, whereas the LS20 wasn't. =Comment: #16================================================= Sripathi Kodi <sripathi.com> - 2008-03-03 03:29 EDT Saw this on LS20 (not on ABAT) as well, Hence it doesn't depend on hardware. =Comment: #17================================================= Sripathi Kodi <sripathi.com> - 2008-03-03 06:02 EDT This time I recreated it again on LS20 and got a kdump. Will report findings soon. I was using stock 2.6.24-rt1 kernel this time, so this is not due to something that RH put in their kernel. =Comment: #19================================================= Sripathi Kodi <sripathi.com> - 2008-03-03 12:10 EDT I have recreated the problem with the following backtrace in the last couple of attempts: kernel BUG at kernel/sched.c:824! invalid opcode: 0000 [1] PREEMPT SMP CPU 3 <snip> Call Trace: [<ffffffff8022fea5>] prio_changed_rt+0x41/0x46 [<ffffffff802370de>] task_setprio+0x178/0x1a0 [<ffffffff80259b8c>] __rt_mutex_adjust_prio+0x20/0x24 [<ffffffff8025a345>] task_blocks_on_rt_mutex+0x15b/0x1bf [<ffffffff8049ca3a>] rt_mutex_slowlock+0x184/0x29d [<ffffffff8049c716>] rt_mutex_lock+0x28/0x2a [<ffffffff8025a577>] __rt_down_read+0x47/0x4b [<ffffffff8025a591>] rt_down_read+0xb/0xd [<ffffffff8025829b>] futex_lock_pi+0x6a0/0x9a8 [<ffffffff802514a3>] lock_hrtimer_base+0x25/0x4b [<ffffffff8025156b>] hrtimer_try_to_cancel+0x64/0x6f [<ffffffff80259077>] do_futex+0xad4/0xb1d [<ffffffff8025171a>] hrtimer_nanosleep+0x5c/0xfa [<ffffffff8022fc6b>] pick_next_highest_task_rt+0x121/0x157 [<ffffffff80251042>] hrtimer_wakeup+0x0/0x21 [<ffffffff802595f4>] compat_sys_futex+0xd8/0xf6 [<ffffffff8020f65b>] syscall_trace_enter+0x95/0x99 [<ffffffff802269f4>] cstar_do_call+0x1b/0x65 I will attach the complete oops message to this bug. Looking at this stack trace, we call task_rq_lock() in task_setprio() to lock the runqueue. The code in task_rq_lock() locks the rq->lock as well as disables irqs. I can't see anything wrong with this. Though I can't figure how, I suspected that the task is being moved between runqueues after it holds the runqueue lock. That would explain the problem we are seeing. To verify this, I added a variable to track the cpu number on which we hold the runqueue lock. Later when the crash happens, I can verify using kdump whether we are on the same cpu where we took the lock. I recreated the problem with this patch and looked at the kdump. I saw that the panic happened on cpu3 and the runqueue lock was also held on cpu3. So it looks clean. I then looked at the runqueues: crash> runq RUNQUEUES[0]: ffff810001011480 RT PRIO_ARRAY: ffff810001011530 CFS RB_ROOT: ffff810001011508 RUNQUEUES[1]: ffff810001021480 RT PRIO_ARRAY: ffff810001021530 CFS RB_ROOT: ffff810001021508 RUNQUEUES[2]: ffff81011209e480 RT PRIO_ARRAY: ffff81011209e530 CFS RB_ROOT: ffff81011209e508 RUNQUEUES[3]: ffff8101120ae480 RT PRIO_ARRAY: ffff8101120ae530 CFS RB_ROOT: ffff8101120ae508 Going through each of them, I see that only runqueue 2 is locked, rest are not. Runqueue 3 should have been locked! crash> struct rq 0xffff810001011480 struct rq { lock = { raw_lock = { slock = 1 }, break_lock = 0 <snip> crash> struct rq 0xffff810001021480 struct rq { lock = { raw_lock = { slock = 4294967295 }, <snip> crash> struct rq 0xffff81011209e480 struct rq { lock = { raw_lock = { slock = 1 }, <snip> crash> struct rq 0xffff8101120ae480 struct rq { lock = { raw_lock = { slock = 1 }, I am going to try to track the pids that lock/unlock rqs to gain more insight into this problem. =Comment: #20================================================= Sripathi Kodi <sripathi.com> - 2008-03-05 03:49 EDT I reported this problem on linux-rt-users. Steven Rostedt sent a patch that seems to fix the problem. The problem was because we briefly unlock the runqueue lock of the target task, during which it can move from a runqueue to another. The patch is: Signed-off-by: Steven Rostedt <srostedt> --- kernel/sched_rt.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6.24.3-rt3/kernel/sched_rt.c =================================================================== --- linux-2.6.24.3-rt3.orig/kernel/sched_rt.c 2008-03-04 13:49:53.000000000 -0500 +++ linux-2.6.24.3-rt3/kernel/sched_rt.c 2008-03-04 13:51:27.000000000 -0500 @@ -840,9 +840,11 @@ static void prio_changed_rt(struct rq *r pull_rt_task(rq); /* * If there's a higher priority task waiting to run - * then reschedule. + * then reschedule. Note, the above pull_rt_task + * can release the rq lock and p could migrate. + * Only reschedule if p is still on the same runqueue. */ - if (p->prio > rq->rt.highest_prio) + if (p->prio > rq->rt.highest_prio && task_rq(p) == rq) resched_task(p); #else /* For UP simply resched on drop of prio */ =Comment: #21================================================= Sripathi Kodi <sripathi.com> - 2008-03-05 03:51 EDT I will get this bug mirrored to RH.
------- Comment From jstultz.com 2008-03-06 15:50 EDT------- (In reply to comment #10) > Here is the link provided by Andrew to download the entire jltf and to run the > perl script. > > http://javasvt.hursley.ibm.com/jtc-tests-package-20080222_03.zip The URL above is giving 404s. Where can I find the package needed to reproduce this?
------- Comment From jstultz.com 2008-03-06 18:21 EDT------- I reproduced this issue w/ the older 2.6.24.1-24 kernel, then I tried with today's kernel-rt-2.6.24.3-29.el5rt kernel and could not reproduce the issue.
------- Comment From jstultz.com 2008-03-06 21:32 EDT------- Marking this Fixed Awaiting Test, and waiting for JTC to confirm.
------- Comment From paul_thwaite.com 2008-03-07 09:49 EDT------- Thanks for the quick turnaround of the kernel build. Unfortunately we're unable to install the new kernel until Monday (due to staff vacation). Results should hopefully be available by COB Monday.
------- Comment From paul_thwaite.com 2008-03-07 14:14 EDT------- Two machines have been installed with the new kernel (thanks Peter!) and tests are running at the moment. Results of the tests will be available on Monday.
------- Comment From paul_thwaite.com 2008-03-10 10:41 EDT------- Testing on kernel level "2.6.24.3-29.el5rt" no longer causes a kernel panic. This defect has been fixed. Thank you. However, although we do not get a kernel panic, we still see 42841 (a hang) which usually kills the machine and needs a reboot. So 42841 is a blocker. I'll update that bug now.
------- Comment From jstultz.com 2008-03-11 18:58 EDT------- Fix for this is now merged in our R2 kernel.
------- Comment From pstanton.com 2008-03-12 12:15 EDT------- Seems that the machines are not hanging as per 48241 - they're actually panicking. RIP is pointing to __spin_lock this time. As before, this is easily reproducible with the same tests that found the first panic. We didn't spot this at first as the panics are not output over the network or serial link - it only appears on the screen. So unfortunately the only output we have is what I've been able to photograph from the screen. Do you want me to re-open this bug or start a new one?
------- Comment From sripathi.com 2008-03-12 12:37 EDT------- (In reply to comment #39) > Seems that the machines are not hanging as per 48241 - they're actually > panicking. RIP is pointing to __spin_lock this time. As before, this is easily > reproducible with the same tests that found the first panic. > We didn't spot this at first as the panics are not output over the network or > serial link - it only appears on the screen. So unfortunately the only output we > have is what I've been able to photograph from the screen. > Do you want me to re-open this bug or start a new one? Is it possible that all hangs seen in 42841 could be explained by this? If that is possible, it will be useful to carry out analysis in 42841. If it turns out to be a totally new problem, we can open a new bug later. We surely like to see the photograph, btw.
------- Comment From pstanton.com 2008-03-12 13:34 EDT------- It could be the same hang as in 42841, but I see kernel messages every time whereas the comments in 42841 say that no output is produced. I've attached a photo from rtj-opt6.hursley.ibm.com. Some of the information looks to have disappeared off the top of the screen, but this is the only one I've seen that shows a call trace. Apologies for the quality of the photography - took this with my mobile phone
Created attachment 297816 [details] Screenshot of rtj-opt6.hursley.ibm.com panicking
------- Comment From sripathi.com 2008-03-13 01:33 EDT------- The screenshot is interesting. The panic seems to be because of an nmi. My guess is that the system was hung and nmi triggered the panic. This is what I have been trying to do without success! I still believe we are seeing bug 42841 here, not the original panic seen with this bug. Hence I will move the discussion and the screenshot to bug 42841.
Where are we with this bug?
------- Comment From sripathi.com 2008-04-08 23:36 EDT------- (In reply to comment #44) > ------- Comment From williams 2008-04-08 18:31 EST------- > Where are we with this bug? Hi Clark, The fix for this problem has been in MRG kernel since 2.6.24.3-29.el5rt kernel. Please see comment #6. However, I agree that we did not make it very clear. Sorry about that. So, the bug is fixed and can be closed.