Bug 437933
| Summary: | crash with 32bit 2.6.24.3-29.el5rt | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Roland Westrelin <roland.westrelin> |
| Component: | realtime-kernel | Assignee: | Jon Masters <jcm> |
| Status: | CLOSED WORKSFORME | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | 1.0 | CC: | austin.zhang, bhu, pzijlstr, srostedt, williams |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | i386 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2008-06-03 14:22:48 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Roland Westrelin
2008-03-18 09:10:50 UTC
Good morning! Can I please get some more information about this crash: *). How are you booting? Parameters? *). Test suite. Can you tell us what you were running, what options it uses, and where to get ahold of that testsuite for reproducibility. *). Command Line for the kernel in question? Did you try with a debug kernel (yet)? :) Thanks! Jon. The kernel is booted with: kernel /vmlinuz-2.6.24.3-29.el5rt ro root=LABEL=/1 console=ttyS0,9600 rhgb quiet The crash happens over night during a run of an internal java test suite. We haven't figured out if one particular test triggers the crash. I will install the debug kernel. I'll let you know what happens. We don't have -debug rpm package for 2.6.24.3-29. Can you provide us one? Here is what I got with the debug kernel: NMI show regs on CPU#0: apic_timer_irqs: 31045376 NMI watchdog running again ... Pid: 17705, comm: java Tainted: G D (2.6.24.3-29.el5rtdebug #1) EIP: 0060:[<c0322cec>] EFLAGS: 00000086 CPU: 0 EIP is at delay_tsc+0x1d/0x43 EAX: aad7ee10 EBX: 00000001 ECX: aad7ee07 EDX: 000093cd ESI: 04f4c966 EDI: 00000000 EBP: eddf0c54 ESP: eddf0c50 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 preempt:00010005 CR0: 8005003b CR2: b7fe4000 CR3: 2bcbd000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c02052f9>] show_trace_log_lvl+0x22/0x3f [<c0205b94>] show_trace+0x17/0x19 [<c02024b0>] show_regs+0x21/0x24 [<c021ae2c>] irq_show_regs_callback+0x62/0x72 [<c04722a8>] nmi_watchdog_tick+0xc2/0x20d [<c0471d30>] do_nmi+0xde/0x2ac [<c04718cb>] nmi_stack_correct+0x26/0x2b [<c0322c98>] __delay+0xe/0x10 [<c0471589>] _raw_spin_lock+0x82/0xe7 [<c04708f7>] __spin_lock+0x59/0x67 [<c0224c31>] double_lock_balance+0x54/0x5c [<c0224e7d>] pull_rt_task+0x81/0x1ad [<c022c32a>] pre_schedule_rt+0x22/0x2b [<c046e7ce>] __schedule+0x1bc/0x84b [<c046eff8>] schedule+0xea/0x109 [<c025263f>] futex_wait+0x231/0x309 [<c0253661>] do_futex+0x60/0x9a4 [<c0254091>] sys_futex+0xec/0xff [<c02042c6>] syscall_call+0x7/0xb ======================= --------------------------- | preempt count: 00010005 ] | 5-level deep critical section nesting: ---------------------------------------- .. [<c046e641>] .... __schedule+0x2f/0x84b .....[<c046eff8>] .. ( <= schedule+0xea/0x109) .. [<c04708b7>] .... __spin_lock+0x19/0x67 .....[<c046e739>] .. ( <= __schedule+0x127/0x84b) .. [<c04708b7>] .... __spin_lock+0x19/0x67 .....[<c0224c31>] .. ( <= double_lock_balance+0x54/0x5c) .. [<c0322ce4>] .... delay_tsc+0x15/0x43 .....[<c0322c98>] .. ( <= __delay+0xe/0x10) .. [<c04708b7>] .... __spin_lock+0x19/0x67 .....[<c021adf5>] .. ( <= irq_show_regs_callback+0x2b/0x72) NMI show regs on CPU#2: apic_timer_irqs: 30774107 Pid: 32, comm: softirq-timer/2 Tainted: G D (2.6.24.3-29.el5rtdebug #1) EIP: 0060:[<c02299d0>] EFLAGS: 00000006 CPU: 2 EIP is at add_preempt_count+0x98/0x132 EAX: 00000003 EBX: c0322c98 ECX: ab8d8863 EDX: f78dc000 ESI: 00000001 EDI: c0322ce4 EBP: f78dcdfc ESP: f78dcde0 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 preempt:00010004 CR0: 8005003b CR2: aa7590d8 CR3: 2bcbd000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c02052f9>] show_trace_log_lvl+0x22/0x3f [<c0205b94>] show_trace+0x17/0x19 [<c02024b0>] show_regs+0x21/0x24 [<c021ae2c>] irq_show_regs_callback+0x62/0x72 [<c04722a8>] nmi_watchdog_tick+0xc2/0x20d [<c0471d30>] do_nmi+0xde/0x2ac [<c04718cb>] nmi_stack_correct+0x26/0x2b [<c0322ce4>] delay_tsc+0x15/0x43 [<c0322c98>] __delay+0xe/0x10 [<c0471589>] _raw_spin_lock+0x82/0xe7 [<c04708f7>] __spin_lock+0x59/0x67 [<c0224c31>] double_lock_balance+0x54/0x5c [<c02251ad>] push_rt_task+0x95/0x1e8 [<c0225312>] push_rt_tasks+0x12/0x19 [<c0225339>] post_schedule_rt+0x20/0x30 [<c0229810>] finish_task_switch+0x6e/0xb7 [<c046edd5>] __schedule+0x7c3/0x84b [<c046eff8>] schedule+0xea/0x109 [<c0235587>] ksoftirqd+0xbf/0x26b [<c0242bbb>] kthread+0x40/0x69 [<c0204f13>] kernel_thread_helper+0x7/0x10 ======================= --------------------------- | preempt count: 00010004 ] | 4-level deep critical section nesting: ---------------------------------------- .. [<c046e641>] .... __schedule+0x2f/0x84b .....[<c046eff8>] .. ( <= schedule+0xea/0x109) .. [<c04708b7>] .... __spin_lock+0x19/0x67 .....[<c0225332>] .. ( <= post_schedule_rt+0x19/0x30) .. [<c04708b7>] .... __spin_lock+0x19/0x67 .....[<c0224c31>] .. ( <= double_lock_balance+0x54/0x5c) .. [<c04708b7>] .... __spin_lock+0x19/0x67 .....[<c021adf5>] .. ( <= irq_show_regs_callback+0x2b/0x72) NMI show regs on CPU#3: apic_timer_irqs: 30619233 Pid: 45, comm: softirq-timer/3 Tainted: G D (2.6.24.3-29.el5rtdebug #1) EIP: 0060:[<c0322c92>] EFLAGS: 00000046 CPU: 3 EIP is at __delay+0x8/0x10 EAX: 00000001 EBX: d28ced80 ECX: b0208e5d EDX: 000093cd ESI: 1c537f1a EDI: 00000000 EBP: f7908cc0 ESP: f7908cc0 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 preempt:00010005 CR0: 8005003b CR2: ffffffd0 CR3: 2bcbd000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c02052f9>] show_trace_log_lvl+0x22/0x3f [<c0205b94>] show_trace+0x17/0x19 [<c02024b0>] show_regs+0x21/0x24 [<c021ae2c>] irq_show_regs_callback+0x62/0x72 [<c04722a8>] nmi_watchdog_tick+0xc2/0x20d [<c0471d30>] do_nmi+0xde/0x2ac [<c04718cb>] nmi_stack_correct+0x26/0x2b [<c0471589>] _raw_spin_lock+0x82/0xe7 [<c04708f7>] __spin_lock+0x59/0x67 [<c046e739>] __schedule+0x127/0x84b [<c0233696>] do_exit+0x70f/0x768 [<c020577e>] die+0x1f6/0x1fe [<c047356f>] do_page_fault+0x74e/0x834 [<c0471822>] error_code+0x72/0x78 [<c046eff8>] schedule+0xea/0x109 [<c0235587>] ksoftirqd+0xbf/0x26b [<c0242bbb>] kthread+0x40/0x69 [<c0204f13>] kernel_thread_helper+0x7/0x10 ======================= --------------------------- | preempt count: 00010005 ] | 5-level deep critical section nesting: ---------------------------------------- .. [<c046e641>] .... __schedule+0x2f/0x84b .....[<c046eff8>] .. ( <= schedule+0xea/0x109) .. [<c04708b7>] .... __spin_lock+0x19/0x67 .....[<c046e739>] .. ( <= __schedule+0x127/0x84b) .. [<c046e641>] .... __schedule+0x2f/0x84b .....[<c0233696>] .. ( <= do_exit+0x70f/0x768) .. [<c04708b7>] .... __spin_lock+0x19/0x67 .....[<c046e739>] .. ( <= __schedule+0x127/0x84b) .. [<c04708b7>] .... __spin_lock+0x19/0x67 .....[<c021adf5>] .. ( <= irq_show_regs_callback+0x2b/0x72) *** Bug 442828 has been marked as a duplicate of this bug. *** Sorry, but #442828 (F9 beta Installation in santa rosa platform failed) is related with this issue? No, that looks like a mistake. This is a MRG Realtime bug, so I don't see any way that F9 could intersect with it. Clark Potential bug in the highmem handling; we implemented kmap_atomic using kmap, and kmap uses flush_tlb_range() which uses on_each_cpu() which can deadlock when called under irq disabled. However on -rt it should not be called from such a context, nor do the above NMI traces suggest it is - so this is likely a red-herring, still worth making a note of, hence this message. |