| Summary: | bt: cannot transition from exception stack to process stack | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Dave Anderson <anderson> |
| Component: | crash | Assignee: | Dave Anderson <anderson> |
| Status: | CLOSED ERRATA | QA Contact: | Kernel Dump QE <kernel-dump-qe> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6.1 | CC: | pbunyan, phan |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | crash-5.1.7-1.el6 | Doc Type: | Bug Fix |
| Doc Text: |
In a rare scenario, a non-crashing CPU received a shutdown NMI (non-maskable interrupt) immediately after receiving an interrupt from another source. Because the IRQ entry-point symbols "IRQ0x00_interrupt" through "IRQ0x##_interrupt" no longer existed, the bt command terminated with the "bt: cannot transition from exception stack to current process stack" error message on AMD64 and Intel 64 architectures. This bug has been fixed, and backtrace now properly transitions from the NMI stack back to the interrupted process stack.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-12-06 16:30:07 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
The shutdown NMI has to be received by a non-crashing cpu
within a couple of instructions after having received an
interrupt from another source. So it's highly unlikely
that it can be reproducible.
I have a fix for it -- the backtrace looks like this:
PID: 0 TASK: ffff88012cd74b00 CPU: 3 COMMAND: "swapper"
#0 [ffff880028267e90] crash_nmi_callback at ffffffff81028a96
#1 [ffff880028267ea0] notifier_call_chain at ffffffff814e13e5
#2 [ffff880028267ee0] atomic_notifier_call_chain at ffffffff814e144a
#3 [ffff880028267ef0] notify_die at ffffffff810942fe
#4 [ffff880028267f20] do_nmi at ffffffff814df033
#5 [ffff880028267f50] nmi at ffffffff814de940
[exception RIP: irq_entries_start+296]
RIP: ffffffff8100b728 RSP: ffff88012cd79e38 RFLAGS: 00000006
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000
RDX: 00000000000000eb RSI: 0000000000000000 RDI: 00000000000399dd
RBP: ffff88012cd79ed8 R8: 0000000000000000 R9: 0000000000000320
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 00000000000000eb R14: 0000000000000002 R15: 0000000000000003
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#6 [ffff88012cd79e38] irq_entries_start at ffffffff8100b728
#7 [ffff88012cd79e60] intel_idle at ffffffff812bc2a1
#8 [ffff88012cd79ee0] cpuidle_idle_call at ffffffff813ed4b7
#9 [ffff88012cd79f00] cpu_idle at ffffffff81009de6
The non-crashing cpu was sitting idle, received an interrupt from
some source, but then immediately received a shutdown NMI from the
crashing cpu.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
In a rare scenario, a non-crashing CPU received a shutdown NMI (non-maskable interrupt) immediately after receiving an interrupt from another source. Because the IRQ entry-point symbols "IRQ0x00_interrupt" through "IRQ0x##_interrupt" no longer existed, the bt command terminated with the "bt: cannot transition from exception stack to current process stack" error message on AMD64 and Intel 64 architectures. This bug has been fixed, and backtrace now properly transitions from the NMI stack back to the interrupted process stack.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1648.html |
Description of problem: kdump testing yielded a vmcore where the following backtrace error occurred when backtracing the active tasks: PID: 0 TASK: ffff88012cd74b00 CPU: 3 COMMAND: "swapper" #0 [ffff880028267e90] crash_nmi_callback at ffffffff81028a96 #1 [ffff880028267ea0] notifier_call_chain at ffffffff814e13e5 #2 [ffff880028267ee0] atomic_notifier_call_chain at ffffffff814e144a #3 [ffff880028267ef0] notify_die at ffffffff810942fe #4 [ffff880028267f20] do_nmi at ffffffff814df033 #5 [ffff880028267f50] nmi at ffffffff814de940 [exception RIP: irq_entries_start+296] RIP: ffffffff8100b728 RSP: ffff88012cd79e38 RFLAGS: 00000006 RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000000000 RDX: 00000000000000eb RSI: 0000000000000000 RDI: 00000000000399dd RBP: ffff88012cd79ed8 R8: 0000000000000000 R9: 0000000000000320 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00000000000000eb R14: 0000000000000002 R15: 0000000000000003 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff88012cd79e38] irq_entries_start at ffffffff8100b728 bt: cannot transition from exception stack to current process stack: exception stack pointer: ffff880028267e90 process stack pointer: ffff88012cd7a048 current stack base: ffff88012cd78000 Version-Release number of selected component (if applicable): crash-5.1.1-2.el6 kernel-2.6.32-156.el6.x86_64 How reproducible: Very difficult -- NMI issued to non-crashing cpu must be received in a small window of opportunity. Steps to Reproduce: 1. 2. 3. Actual results: As shown above. Expected results: Backtrace should properly transition from the NMI stack back to the interrupted process stack. Additional info: Reported by Paul Bunyan while kdump testing on intel-piketon-tpm-01.lab.bos.redhat.com https://beaker.engineering.redhat.com/jobs/95032 http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2011/06/950/95032/193998/2097628/9796461//test_log--kernel-kdump-analyse-crash.log I have a copy of the vmlinux/vmcore pair.