Bug 802234

Summary: WARNING: possibly bogus exception frame
Product: Red Hat Enterprise Linux 7 Reporter: Cui Chun <ccui>
Component: kernelAssignee: Dave Anderson <anderson>
Status: CLOSED CURRENTRELEASE QA Contact: Qiao Zhao <qzhao>
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: qcai
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: crash-6.0.5-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 12:13:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 839713    

Description Cui Chun 2012-03-12 07:49:06 UTC
Description of problem:

Met a problem during RHEL7 testing.

------------------
...
[exception RIP: unknown or invalid address]
...
bt: WARNING: possibly bogus exception frame
----------------------

vmcore/vmlinux pair can be found here.
http://lacrosse.corp.redhat.com/~qcai/vmcore/

Comment 1 Dave Anderson 2012-03-12 12:34:26 UTC
The IRQ exception frame address determination is off-by-8 in
this vmcore.

Presumably something has changed in recent kernels with respect
to the process-to-IRQ stack transition.  

Can you confirm that you are seeing this in more than one
instance, i.e., where you are running your "crasher" module
and catching one or more tasks that were running on their 
IRQ stack at the time of the crash?

Comment 2 Dave Anderson 2012-03-16 13:25:17 UTC
*** Bug 803982 has been marked as a duplicate of this bug. ***

Comment 6 Dave Anderson 2012-04-05 19:43:36 UTC
crash-6.0.5-1.el7 is now available in brew.

When running "bt -a" on the reporter-supplied vmcore, two of the
eight active task backtraces generate "WARNING: possibly bogus 
exception frame" messages at the point where they transition 
from their per-cpu IRQ stack back to the task's process stack, 
because the exception frame register contents are off-by-8.

Running with crash-6.0.4-1.el7, here are the two suspect backtraces:

 crash> bt -a
 ... [ cut ] ...
 
 PID: 0      TASK: ffff8802221e2670  CPU: 1   COMMAND: "swapper/1"
  #0 [ffff880426e07e80] crash_nmi_callback at ffffffff81037470
  #1 [ffff880426e07ea0] nmi_handle at ffffffff8163161e
  #2 [ffff880426e07f00] default_do_nmi at ffffffff81631795
  #3 [ffff880426e07f30] do_nmi at ffffffff816319f8
  #4 [ffff880426e07f50] nmi at ffffffff81630bc0
     [exception RIP: lock_release+57]
     RIP: ffffffff810c3869  RSP: ffff880426e03c98  RFLAGS: 00000046
     RAX: 0000000000000000  RBX: ffff8804275d43d8  RCX: 0000000000000001
     RDX: ffff8802221e2670  RSI: 0000000000000001  RDI: ffff8804275d43d8
     RBP: ffff880426e03ce0   R8: ffff8804275d44b0   R9: 0000000000000000
     R10: 0000000000000002  R11: 0000000000000000  R12: ffffffff81063b7d
     R13: 0000000000000106  R14: ffff8804275d43c0  R15: 0000000000000001
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 --- <NMI exception stack> ---
  #5 [ffff880426e03c98] lock_release at ffffffff810c3869
  #6 [ffff880426e03ce8] _raw_spin_unlock at ffffffff8162fd23
  #7 [ffff880426e03d08] double_rq_unlock at ffffffff81063b7d
  #8 [ffff880426e03d28] load_balance at ffffffff81076e65
  #9 [ffff880426e03db8] rebalance_domains at ffffffff810771ad
 #10 [ffff880426e03e38] nohz_idle_balance at ffffffff810773e5
 #11 [ffff880426e03e88] run_rebalance_domains at ffffffff810774a2
 #12 [ffff880426e03ea8] __do_softirq at ffffffff81089a18
 #13 [ffff880426e03f28] call_softirq at ffffffff8163b57c
 #14 [ffff880426e03f40] do_softirq at ffffffff8101b415
 #15 [ffff880426e03f60] irq_exit at ffffffff8108a04e
 #16 [ffff880426e03f80] scheduler_ipi at ffffffff81077fd9
 #17 [ffff880426e03fa0] smp_reschedule_interrupt at ffffffff8103818a
 #18 [ffff880426e03fb0] reschedule_interrupt at ffffffff8163b0f3
 --- <IRQ stack> ---
 #19 [ffff8802221e9e00] reschedule_interrupt at ffffffff8163b0f3
     RIP: ffffffffffffff02  RSP: 0000000000000246  RFLAGS: 00000010
     RAX: 0000000000000000  RBX: ffff8802221e9eb8  RCX: 0000000000000000
     RDX: 0000000000000001  RSI: 0000000000000000  RDI: ffffffff81c2cb00
     RBP: 0000000000000046   R8: 0000000000000000   R9: ffffffff81c2cb00
     R10: 0000000000000000  R11: ffffffff81c2cb00  R12: ffff8802221e9e68
     R13: 0000000000000000  R14: 000034a90eb72140  R15: 0000000121cfc000
     ORIG_RAX: ffffffff810222a3  CS: ffffffff8104501b  SS: ffff8802221e9eb8
 WARNING: possibly bogus exception frame
 
 ... [ cut ] ...
 
 PID: 33     TASK: ffff880421d48000  CPU: 7   COMMAND: "migration/7"
  #0 [ffff880427407cc0] panic at ffffffff816173e7
  #1 [ffff880427407d40] watchdog_overflow_callback at ffffffff81105173
  #2 [ffff880427407d50] __perf_event_overflow at ffffffff81143016
  #3 [ffff880427407de0] perf_event_overflow at ffffffff81143704
  #4 [ffff880427407df0] x86_pmu_handle_irq at ffffffff8102aed7
  #5 [ffff880427407e90] perf_event_nmi_handler at ffffffff81631fc1
  #6 [ffff880427407ea0] nmi_handle at ffffffff8163161e
  #7 [ffff880427407f00] default_do_nmi at ffffffff81631795
  #8 [ffff880427407f30] do_nmi at ffffffff816319f8
  #9 [ffff880427407f50] nmi at ffffffff81630bc0
     [exception RIP: __delay+16]
     RIP: ffffffff8131b740  RSP: ffff880427403ad8  RFLAGS: 00000006
     RAX: ffff880421d45fd8  RBX: ffff8804275d43c0  RCX: 000000004f746767
     RDX: 00000000000000c4  RSI: ffffffff81079593  RDI: 0000000000000001
     RBP: ffff880427403b00   R8: 0000000000000002   R9: 0000000000000001
     R10: ffff8804275d43d8  R11: 0000000000000000  R12: 000000007141ddb0
     R13: 000000000cddcfb2  R14: 0000000000000001  R15: ffff880421d48000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 --- <NMI exception stack> ---
 #10 [ffff880427403ad8] __delay at ffffffff8131b740
 #11 [ffff880427403ad8] do_raw_spin_lock at ffffffff81322d1e
 #12 [ffff880427403b08] _raw_spin_lock at ffffffff8162f666
 #13 [ffff880427403b38] scheduler_tick at ffffffff81079593
 #14 [ffff880427403b78] update_process_times at ffffffff8109401e
 #15 [ffff880427403ba8] tick_sched_timer at ffffffff810bbe64
 #16 [ffff880427403bd8] __run_hrtimer at ffffffff810ad353
 #17 [ffff880427403c38] hrtimer_interrupt at ffffffff810add83
 #18 [ffff880427403ca8] smp_apic_timer_interrupt at ffffffff8163bf39
 #19 [ffff880427403cc8] apic_timer_interrupt at ffffffff81639df3
 #20 [ffff880427403d78] ehci_watchdog at ffffffff8144d715
 #21 [ffff880427403da8] call_timer_fn at ffffffff81092a7a
 #22 [ffff880427403e38] run_timer_softirq at ffffffff81092e69
 #23 [ffff880427403eb8] __do_softirq at ffffffff81089a18
 #24 [ffff880427403f38] call_softirq at ffffffff8163b57c
 #25 [ffff880427403f50] do_softirq at ffffffff8101b415
 #26 [ffff880427403f70] irq_exit at ffffffff8108a04e
 #27 [ffff880427403f90] smp_apic_timer_interrupt at ffffffff8163bf3e
 #28 [ffff880427403fb0] apic_timer_interrupt at ffffffff81639df3
 --- <IRQ stack> ---
 #29 [ffff880421d45c10] apic_timer_interrupt at ffffffff81639df3
     [exception RIP: unknown or invalid address]
     RIP: ffffffffffffff10  RSP: 0000000000000246  RFLAGS: 00000010
     RAX: 0000000000000002  RBX: ffff880421d45cd0  RCX: 0000000000000007
     RDX: 0000000000000000  RSI: 00000000000042a2  RDI: ffff8804275d43d8
     RBP: ffff880421d45c38   R8: 0000000000000000   R9: ffff8804275d43d8
     R10: 0000000000000000  R11: ffffffff816305f4  R12: ffff88022227a670
     R13: ffff880421d44000  R14: ffff8804275d43c0  R15: 0000000000000002
     ORIG_RAX: ffffffff8162fc60  CS: ffffffff8162fc64  SS: ffff880421d45cc0
 WARNING: possibly bogus exception frame
 #30 [ffff880421d45cd8] finish_task_switch at ffffffff8106586c
 #31 [ffff880421d45d28] __schedule at ffffffff8162beef
 #32 [ffff880421d45da8] schedule at ffffffff8162c65f
 #33 [ffff880421d45db8] cpu_stopper_thread at ffffffff810ed97d
 #34 [ffff880421d45e98] kthread at ffffffff810a8380
 #35 [ffff880421d45f48] kernel_thread_helper at ffffffff8163b484
 crash>
 
With crash-6.0.5-1.el7, the exception frame contents are correct, so
there are no "bogus exception frame" warnings, and the stack transition
works as expected:
 
 crash> bt -a
 ... [ cut ] ...
 
 PID: 0      TASK: ffff8802221e2670  CPU: 1   COMMAND: "swapper/1"
  #0 [ffff880426e07e80] crash_nmi_callback at ffffffff81037470
  #1 [ffff880426e07ea0] nmi_handle at ffffffff8163161e
  #2 [ffff880426e07f00] default_do_nmi at ffffffff81631795
  #3 [ffff880426e07f30] do_nmi at ffffffff816319f8
  #4 [ffff880426e07f50] nmi at ffffffff81630bc0
     [exception RIP: lock_release+57]
     RIP: ffffffff810c3869  RSP: ffff880426e03c98  RFLAGS: 00000046
     RAX: 0000000000000000  RBX: ffff8804275d43d8  RCX: 0000000000000001
     RDX: ffff8802221e2670  RSI: 0000000000000001  RDI: ffff8804275d43d8
     RBP: ffff880426e03ce0   R8: ffff8804275d44b0   R9: 0000000000000000
     R10: 0000000000000002  R11: 0000000000000000  R12: ffffffff81063b7d
     R13: 0000000000000106  R14: ffff8804275d43c0  R15: 0000000000000001
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 --- <NMI exception stack> ---
  #5 [ffff880426e03c98] lock_release at ffffffff810c3869
  #6 [ffff880426e03ce8] _raw_spin_unlock at ffffffff8162fd23
  #7 [ffff880426e03d08] double_rq_unlock at ffffffff81063b7d
  #8 [ffff880426e03d28] load_balance at ffffffff81076e65
  #9 [ffff880426e03db8] rebalance_domains at ffffffff810771ad
 #10 [ffff880426e03e38] nohz_idle_balance at ffffffff810773e5
 #11 [ffff880426e03e88] run_rebalance_domains at ffffffff810774a2
 #12 [ffff880426e03ea8] __do_softirq at ffffffff81089a18
 #13 [ffff880426e03f28] call_softirq at ffffffff8163b57c
 #14 [ffff880426e03f40] do_softirq at ffffffff8101b415
 #15 [ffff880426e03f60] irq_exit at ffffffff8108a04e
 #16 [ffff880426e03f80] scheduler_ipi at ffffffff81077fd9
 #17 [ffff880426e03fa0] smp_reschedule_interrupt at ffffffff8103818a
 #18 [ffff880426e03fb0] reschedule_interrupt at ffffffff8163b0f3
 --- <IRQ stack> ---
 #19 [ffff8802221e9e08] reschedule_interrupt at ffffffff8163b0f3
     [exception RIP: native_safe_halt+11]
     RIP: ffffffff8104501b  RSP: ffff8802221e9eb8  RFLAGS: 00000246
     RAX: 0000000000000000  RBX: ffffffff81c2cb00  RCX: 0000000000000001
     RDX: 0000000000000000  RSI: ffffffff81c2cb00  RDI: ffffffff810222a3
     RBP: ffff8802221e9eb8   R8: 0000000000000000   R9: 0000000000000000
     R10: ffffffff81c2cb00  R11: 0000000000000000  R12: 0000000000000046
     R13: ffff8802221e9e68  R14: 0000000000000000  R15: 000034a90eb72140
     ORIG_RAX: ffffffffffffff02  CS: 0010  SS: 0018
 #20 [ffff8802221e9eb0] trace_hardirqs_on at ffffffff810c479d
 #21 [ffff8802221e9ec0] default_idle at ffffffff810222a8
 #22 [ffff8802221e9ef0] cpu_idle at ffffffff8101823f
 
 ... [cut ] ...
 
 PID: 33     TASK: ffff880421d48000  CPU: 7   COMMAND: "migration/7"
  #0 [ffff880427407cc0] panic at ffffffff816173e7
  #1 [ffff880427407d40] watchdog_overflow_callback at ffffffff81105173
  #2 [ffff880427407d50] __perf_event_overflow at ffffffff81143016
  #3 [ffff880427407de0] perf_event_overflow at ffffffff81143704
  #4 [ffff880427407df0] x86_pmu_handle_irq at ffffffff8102aed7
  #5 [ffff880427407e90] perf_event_nmi_handler at ffffffff81631fc1
  #6 [ffff880427407ea0] nmi_handle at ffffffff8163161e
  #7 [ffff880427407f00] default_do_nmi at ffffffff81631795
  #8 [ffff880427407f30] do_nmi at ffffffff816319f8
  #9 [ffff880427407f50] nmi at ffffffff81630bc0
     [exception RIP: __delay+16]
     RIP: ffffffff8131b740  RSP: ffff880427403ad8  RFLAGS: 00000006
     RAX: ffff880421d45fd8  RBX: ffff8804275d43c0  RCX: 000000004f746767
     RDX: 00000000000000c4  RSI: ffffffff81079593  RDI: 0000000000000001
     RBP: ffff880427403b00   R8: 0000000000000002   R9: 0000000000000001
     R10: ffff8804275d43d8  R11: 0000000000000000  R12: 000000007141ddb0
     R13: 000000000cddcfb2  R14: 0000000000000001  R15: ffff880421d48000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 --- <NMI exception stack> ---
 #10 [ffff880427403ad8] __delay at ffffffff8131b740
 #11 [ffff880427403ad8] do_raw_spin_lock at ffffffff81322d1e
 #12 [ffff880427403b08] _raw_spin_lock at ffffffff8162f666
 #13 [ffff880427403b38] scheduler_tick at ffffffff81079593
 #14 [ffff880427403b78] update_process_times at ffffffff8109401e
 #15 [ffff880427403ba8] tick_sched_timer at ffffffff810bbe64
 #16 [ffff880427403bd8] __run_hrtimer at ffffffff810ad353
 #17 [ffff880427403c38] hrtimer_interrupt at ffffffff810add83
 #18 [ffff880427403ca8] smp_apic_timer_interrupt at ffffffff8163bf39
 #19 [ffff880427403cc8] apic_timer_interrupt at ffffffff81639df3
 #20 [ffff880427403d78] ehci_watchdog at ffffffff8144d715
 #21 [ffff880427403da8] call_timer_fn at ffffffff81092a7a
 #22 [ffff880427403e38] run_timer_softirq at ffffffff81092e69
 #23 [ffff880427403eb8] __do_softirq at ffffffff81089a18
 #24 [ffff880427403f38] call_softirq at ffffffff8163b57c
 #25 [ffff880427403f50] do_softirq at ffffffff8101b415
 #26 [ffff880427403f70] irq_exit at ffffffff8108a04e
 #27 [ffff880427403f90] smp_apic_timer_interrupt at ffffffff8163bf3e
 #28 [ffff880427403fb0] apic_timer_interrupt at ffffffff81639df3
 --- <IRQ stack> ---
 #29 [ffff880421d45c18] apic_timer_interrupt at ffffffff81639df3
     [exception RIP: _raw_spin_unlock_irq+52]
     RIP: ffffffff8162fc64  RSP: ffff880421d45cc0  RFLAGS: 00000246
     RAX: 0000000000000007  RBX: ffffffff816305f4  RCX: 0000000000000000
     RDX: 00000000000042a2  RSI: ffff8804275d43d8  RDI: ffffffff8162fc60
     RBP: ffff880421d45cd0   R8: 0000000000000002   R9: 0000000000000000
     R10: ffff8804275d43d8  R11: 0000000000000000  R12: ffff880421d45c38
     R13: ffff88022227a670  R14: ffff880421d44000  R15: ffff8804275d43c0
     ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
 #30 [ffff880421d45cd8] finish_task_switch at ffffffff8106586c
 #31 [ffff880421d45d28] __schedule at ffffffff8162beef
 #32 [ffff880421d45da8] schedule at ffffffff8162c65f
 #33 [ffff880421d45db8] cpu_stopper_thread at ffffffff810ed97d
 #34 [ffff880421d45e98] kthread at ffffffff810a8380
 #35 [ffff880421d45f48] kernel_thread_helper at ffffffff8163b484
 crash>

Comment 8 Ludek Smid 2014-06-13 12:13:30 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.