Bug 802234
Summary: | WARNING: possibly bogus exception frame | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Cui Chun <ccui> |
Component: | kernel | Assignee: | Dave Anderson <anderson> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Qiao Zhao <qzhao> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 7.0 | CC: | qcai |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | crash-6.0.5-1.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-06-13 12:13:30 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 839713 |
Description
Cui Chun
2012-03-12 07:49:06 UTC
The IRQ exception frame address determination is off-by-8 in this vmcore. Presumably something has changed in recent kernels with respect to the process-to-IRQ stack transition. Can you confirm that you are seeing this in more than one instance, i.e., where you are running your "crasher" module and catching one or more tasks that were running on their IRQ stack at the time of the crash? *** Bug 803982 has been marked as a duplicate of this bug. *** crash-6.0.5-1.el7 is now available in brew. When running "bt -a" on the reporter-supplied vmcore, two of the eight active task backtraces generate "WARNING: possibly bogus exception frame" messages at the point where they transition from their per-cpu IRQ stack back to the task's process stack, because the exception frame register contents are off-by-8. Running with crash-6.0.4-1.el7, here are the two suspect backtraces: crash> bt -a ... [ cut ] ... PID: 0 TASK: ffff8802221e2670 CPU: 1 COMMAND: "swapper/1" #0 [ffff880426e07e80] crash_nmi_callback at ffffffff81037470 #1 [ffff880426e07ea0] nmi_handle at ffffffff8163161e #2 [ffff880426e07f00] default_do_nmi at ffffffff81631795 #3 [ffff880426e07f30] do_nmi at ffffffff816319f8 #4 [ffff880426e07f50] nmi at ffffffff81630bc0 [exception RIP: lock_release+57] RIP: ffffffff810c3869 RSP: ffff880426e03c98 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff8804275d43d8 RCX: 0000000000000001 RDX: ffff8802221e2670 RSI: 0000000000000001 RDI: ffff8804275d43d8 RBP: ffff880426e03ce0 R8: ffff8804275d44b0 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ffffffff81063b7d R13: 0000000000000106 R14: ffff8804275d43c0 R15: 0000000000000001 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff880426e03c98] lock_release at ffffffff810c3869 #6 [ffff880426e03ce8] _raw_spin_unlock at ffffffff8162fd23 #7 [ffff880426e03d08] double_rq_unlock at ffffffff81063b7d #8 [ffff880426e03d28] load_balance at ffffffff81076e65 #9 [ffff880426e03db8] rebalance_domains at ffffffff810771ad #10 [ffff880426e03e38] nohz_idle_balance at ffffffff810773e5 #11 [ffff880426e03e88] run_rebalance_domains at ffffffff810774a2 #12 [ffff880426e03ea8] __do_softirq at ffffffff81089a18 #13 [ffff880426e03f28] call_softirq at ffffffff8163b57c #14 [ffff880426e03f40] do_softirq at ffffffff8101b415 #15 [ffff880426e03f60] irq_exit at ffffffff8108a04e #16 [ffff880426e03f80] scheduler_ipi at ffffffff81077fd9 #17 [ffff880426e03fa0] smp_reschedule_interrupt at ffffffff8103818a #18 [ffff880426e03fb0] reschedule_interrupt at ffffffff8163b0f3 --- <IRQ stack> --- #19 [ffff8802221e9e00] reschedule_interrupt at ffffffff8163b0f3 RIP: ffffffffffffff02 RSP: 0000000000000246 RFLAGS: 00000010 RAX: 0000000000000000 RBX: ffff8802221e9eb8 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff81c2cb00 RBP: 0000000000000046 R8: 0000000000000000 R9: ffffffff81c2cb00 R10: 0000000000000000 R11: ffffffff81c2cb00 R12: ffff8802221e9e68 R13: 0000000000000000 R14: 000034a90eb72140 R15: 0000000121cfc000 ORIG_RAX: ffffffff810222a3 CS: ffffffff8104501b SS: ffff8802221e9eb8 WARNING: possibly bogus exception frame ... [ cut ] ... PID: 33 TASK: ffff880421d48000 CPU: 7 COMMAND: "migration/7" #0 [ffff880427407cc0] panic at ffffffff816173e7 #1 [ffff880427407d40] watchdog_overflow_callback at ffffffff81105173 #2 [ffff880427407d50] __perf_event_overflow at ffffffff81143016 #3 [ffff880427407de0] perf_event_overflow at ffffffff81143704 #4 [ffff880427407df0] x86_pmu_handle_irq at ffffffff8102aed7 #5 [ffff880427407e90] perf_event_nmi_handler at ffffffff81631fc1 #6 [ffff880427407ea0] nmi_handle at ffffffff8163161e #7 [ffff880427407f00] default_do_nmi at ffffffff81631795 #8 [ffff880427407f30] do_nmi at ffffffff816319f8 #9 [ffff880427407f50] nmi at ffffffff81630bc0 [exception RIP: __delay+16] RIP: ffffffff8131b740 RSP: ffff880427403ad8 RFLAGS: 00000006 RAX: ffff880421d45fd8 RBX: ffff8804275d43c0 RCX: 000000004f746767 RDX: 00000000000000c4 RSI: ffffffff81079593 RDI: 0000000000000001 RBP: ffff880427403b00 R8: 0000000000000002 R9: 0000000000000001 R10: ffff8804275d43d8 R11: 0000000000000000 R12: 000000007141ddb0 R13: 000000000cddcfb2 R14: 0000000000000001 R15: ffff880421d48000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #10 [ffff880427403ad8] __delay at ffffffff8131b740 #11 [ffff880427403ad8] do_raw_spin_lock at ffffffff81322d1e #12 [ffff880427403b08] _raw_spin_lock at ffffffff8162f666 #13 [ffff880427403b38] scheduler_tick at ffffffff81079593 #14 [ffff880427403b78] update_process_times at ffffffff8109401e #15 [ffff880427403ba8] tick_sched_timer at ffffffff810bbe64 #16 [ffff880427403bd8] __run_hrtimer at ffffffff810ad353 #17 [ffff880427403c38] hrtimer_interrupt at ffffffff810add83 #18 [ffff880427403ca8] smp_apic_timer_interrupt at ffffffff8163bf39 #19 [ffff880427403cc8] apic_timer_interrupt at ffffffff81639df3 #20 [ffff880427403d78] ehci_watchdog at ffffffff8144d715 #21 [ffff880427403da8] call_timer_fn at ffffffff81092a7a #22 [ffff880427403e38] run_timer_softirq at ffffffff81092e69 #23 [ffff880427403eb8] __do_softirq at ffffffff81089a18 #24 [ffff880427403f38] call_softirq at ffffffff8163b57c #25 [ffff880427403f50] do_softirq at ffffffff8101b415 #26 [ffff880427403f70] irq_exit at ffffffff8108a04e #27 [ffff880427403f90] smp_apic_timer_interrupt at ffffffff8163bf3e #28 [ffff880427403fb0] apic_timer_interrupt at ffffffff81639df3 --- <IRQ stack> --- #29 [ffff880421d45c10] apic_timer_interrupt at ffffffff81639df3 [exception RIP: unknown or invalid address] RIP: ffffffffffffff10 RSP: 0000000000000246 RFLAGS: 00000010 RAX: 0000000000000002 RBX: ffff880421d45cd0 RCX: 0000000000000007 RDX: 0000000000000000 RSI: 00000000000042a2 RDI: ffff8804275d43d8 RBP: ffff880421d45c38 R8: 0000000000000000 R9: ffff8804275d43d8 R10: 0000000000000000 R11: ffffffff816305f4 R12: ffff88022227a670 R13: ffff880421d44000 R14: ffff8804275d43c0 R15: 0000000000000002 ORIG_RAX: ffffffff8162fc60 CS: ffffffff8162fc64 SS: ffff880421d45cc0 WARNING: possibly bogus exception frame #30 [ffff880421d45cd8] finish_task_switch at ffffffff8106586c #31 [ffff880421d45d28] __schedule at ffffffff8162beef #32 [ffff880421d45da8] schedule at ffffffff8162c65f #33 [ffff880421d45db8] cpu_stopper_thread at ffffffff810ed97d #34 [ffff880421d45e98] kthread at ffffffff810a8380 #35 [ffff880421d45f48] kernel_thread_helper at ffffffff8163b484 crash> With crash-6.0.5-1.el7, the exception frame contents are correct, so there are no "bogus exception frame" warnings, and the stack transition works as expected: crash> bt -a ... [ cut ] ... PID: 0 TASK: ffff8802221e2670 CPU: 1 COMMAND: "swapper/1" #0 [ffff880426e07e80] crash_nmi_callback at ffffffff81037470 #1 [ffff880426e07ea0] nmi_handle at ffffffff8163161e #2 [ffff880426e07f00] default_do_nmi at ffffffff81631795 #3 [ffff880426e07f30] do_nmi at ffffffff816319f8 #4 [ffff880426e07f50] nmi at ffffffff81630bc0 [exception RIP: lock_release+57] RIP: ffffffff810c3869 RSP: ffff880426e03c98 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff8804275d43d8 RCX: 0000000000000001 RDX: ffff8802221e2670 RSI: 0000000000000001 RDI: ffff8804275d43d8 RBP: ffff880426e03ce0 R8: ffff8804275d44b0 R9: 0000000000000000 R10: 0000000000000002 R11: 0000000000000000 R12: ffffffff81063b7d R13: 0000000000000106 R14: ffff8804275d43c0 R15: 0000000000000001 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #5 [ffff880426e03c98] lock_release at ffffffff810c3869 #6 [ffff880426e03ce8] _raw_spin_unlock at ffffffff8162fd23 #7 [ffff880426e03d08] double_rq_unlock at ffffffff81063b7d #8 [ffff880426e03d28] load_balance at ffffffff81076e65 #9 [ffff880426e03db8] rebalance_domains at ffffffff810771ad #10 [ffff880426e03e38] nohz_idle_balance at ffffffff810773e5 #11 [ffff880426e03e88] run_rebalance_domains at ffffffff810774a2 #12 [ffff880426e03ea8] __do_softirq at ffffffff81089a18 #13 [ffff880426e03f28] call_softirq at ffffffff8163b57c #14 [ffff880426e03f40] do_softirq at ffffffff8101b415 #15 [ffff880426e03f60] irq_exit at ffffffff8108a04e #16 [ffff880426e03f80] scheduler_ipi at ffffffff81077fd9 #17 [ffff880426e03fa0] smp_reschedule_interrupt at ffffffff8103818a #18 [ffff880426e03fb0] reschedule_interrupt at ffffffff8163b0f3 --- <IRQ stack> --- #19 [ffff8802221e9e08] reschedule_interrupt at ffffffff8163b0f3 [exception RIP: native_safe_halt+11] RIP: ffffffff8104501b RSP: ffff8802221e9eb8 RFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffff81c2cb00 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffff81c2cb00 RDI: ffffffff810222a3 RBP: ffff8802221e9eb8 R8: 0000000000000000 R9: 0000000000000000 R10: ffffffff81c2cb00 R11: 0000000000000000 R12: 0000000000000046 R13: ffff8802221e9e68 R14: 0000000000000000 R15: 000034a90eb72140 ORIG_RAX: ffffffffffffff02 CS: 0010 SS: 0018 #20 [ffff8802221e9eb0] trace_hardirqs_on at ffffffff810c479d #21 [ffff8802221e9ec0] default_idle at ffffffff810222a8 #22 [ffff8802221e9ef0] cpu_idle at ffffffff8101823f ... [cut ] ... PID: 33 TASK: ffff880421d48000 CPU: 7 COMMAND: "migration/7" #0 [ffff880427407cc0] panic at ffffffff816173e7 #1 [ffff880427407d40] watchdog_overflow_callback at ffffffff81105173 #2 [ffff880427407d50] __perf_event_overflow at ffffffff81143016 #3 [ffff880427407de0] perf_event_overflow at ffffffff81143704 #4 [ffff880427407df0] x86_pmu_handle_irq at ffffffff8102aed7 #5 [ffff880427407e90] perf_event_nmi_handler at ffffffff81631fc1 #6 [ffff880427407ea0] nmi_handle at ffffffff8163161e #7 [ffff880427407f00] default_do_nmi at ffffffff81631795 #8 [ffff880427407f30] do_nmi at ffffffff816319f8 #9 [ffff880427407f50] nmi at ffffffff81630bc0 [exception RIP: __delay+16] RIP: ffffffff8131b740 RSP: ffff880427403ad8 RFLAGS: 00000006 RAX: ffff880421d45fd8 RBX: ffff8804275d43c0 RCX: 000000004f746767 RDX: 00000000000000c4 RSI: ffffffff81079593 RDI: 0000000000000001 RBP: ffff880427403b00 R8: 0000000000000002 R9: 0000000000000001 R10: ffff8804275d43d8 R11: 0000000000000000 R12: 000000007141ddb0 R13: 000000000cddcfb2 R14: 0000000000000001 R15: ffff880421d48000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #10 [ffff880427403ad8] __delay at ffffffff8131b740 #11 [ffff880427403ad8] do_raw_spin_lock at ffffffff81322d1e #12 [ffff880427403b08] _raw_spin_lock at ffffffff8162f666 #13 [ffff880427403b38] scheduler_tick at ffffffff81079593 #14 [ffff880427403b78] update_process_times at ffffffff8109401e #15 [ffff880427403ba8] tick_sched_timer at ffffffff810bbe64 #16 [ffff880427403bd8] __run_hrtimer at ffffffff810ad353 #17 [ffff880427403c38] hrtimer_interrupt at ffffffff810add83 #18 [ffff880427403ca8] smp_apic_timer_interrupt at ffffffff8163bf39 #19 [ffff880427403cc8] apic_timer_interrupt at ffffffff81639df3 #20 [ffff880427403d78] ehci_watchdog at ffffffff8144d715 #21 [ffff880427403da8] call_timer_fn at ffffffff81092a7a #22 [ffff880427403e38] run_timer_softirq at ffffffff81092e69 #23 [ffff880427403eb8] __do_softirq at ffffffff81089a18 #24 [ffff880427403f38] call_softirq at ffffffff8163b57c #25 [ffff880427403f50] do_softirq at ffffffff8101b415 #26 [ffff880427403f70] irq_exit at ffffffff8108a04e #27 [ffff880427403f90] smp_apic_timer_interrupt at ffffffff8163bf3e #28 [ffff880427403fb0] apic_timer_interrupt at ffffffff81639df3 --- <IRQ stack> --- #29 [ffff880421d45c18] apic_timer_interrupt at ffffffff81639df3 [exception RIP: _raw_spin_unlock_irq+52] RIP: ffffffff8162fc64 RSP: ffff880421d45cc0 RFLAGS: 00000246 RAX: 0000000000000007 RBX: ffffffff816305f4 RCX: 0000000000000000 RDX: 00000000000042a2 RSI: ffff8804275d43d8 RDI: ffffffff8162fc60 RBP: ffff880421d45cd0 R8: 0000000000000002 R9: 0000000000000000 R10: ffff8804275d43d8 R11: 0000000000000000 R12: ffff880421d45c38 R13: ffff88022227a670 R14: ffff880421d44000 R15: ffff8804275d43c0 ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018 #30 [ffff880421d45cd8] finish_task_switch at ffffffff8106586c #31 [ffff880421d45d28] __schedule at ffffffff8162beef #32 [ffff880421d45da8] schedule at ffffffff8162c65f #33 [ffff880421d45db8] cpu_stopper_thread at ffffffff810ed97d #34 [ffff880421d45e98] kthread at ffffffff810a8380 #35 [ffff880421d45f48] kernel_thread_helper at ffffffff8163b484 crash> This request was resolved in Red Hat Enterprise Linux 7.0. Contact your manager or support representative in case you have further questions about the request. |