Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 712214

Summary:	bt: cannot transition from exception stack to process stack
Product:	Red Hat Enterprise Linux 6	Reporter:	Dave Anderson <anderson>
Component:	crash	Assignee:	Dave Anderson <anderson>
Status:	CLOSED ERRATA	QA Contact:	Kernel Dump QE <kernel-dump-qe>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	6.1	CC:	pbunyan, phan
Target Milestone:	rc
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	crash-5.1.7-1.el6	Doc Type:	Bug Fix
Doc Text:	In a rare scenario, a non-crashing CPU received a shutdown NMI (non-maskable interrupt) immediately after receiving an interrupt from another source. Because the IRQ entry-point symbols "IRQ0x00_interrupt" through "IRQ0x##_interrupt" no longer existed, the bt command terminated with the "bt: cannot transition from exception stack to current process stack" error message on AMD64 and Intel 64 architectures. This bug has been fixed, and backtrace now properly transitions from the NMI stack back to the interrupted process stack.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-12-06 16:30:07 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dave Anderson 2011-06-09 19:42:03 UTC

Description of problem:

  kdump testing yielded a vmcore where the following backtrace error
  occurred when backtracing the active tasks:

PID: 0      TASK: ffff88012cd74b00  CPU: 3   COMMAND: "swapper"
 #0 [ffff880028267e90] crash_nmi_callback at ffffffff81028a96
 #1 [ffff880028267ea0] notifier_call_chain at ffffffff814e13e5
 #2 [ffff880028267ee0] atomic_notifier_call_chain at ffffffff814e144a
 #3 [ffff880028267ef0] notify_die at ffffffff810942fe
 #4 [ffff880028267f20] do_nmi at ffffffff814df033
 #5 [ffff880028267f50] nmi at ffffffff814de940
    [exception RIP: irq_entries_start+296]
    RIP: ffffffff8100b728  RSP: ffff88012cd79e38  RFLAGS: 00000006
    RAX: 0000000000000000  RBX: 0000000000000004  RCX: 0000000000000000
    RDX: 00000000000000eb  RSI: 0000000000000000  RDI: 00000000000399dd
    RBP: ffff88012cd79ed8   R8: 0000000000000000   R9: 0000000000000320
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 00000000000000eb  R14: 0000000000000002  R15: 0000000000000003
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #6 [ffff88012cd79e38] irq_entries_start at ffffffff8100b728
bt: cannot transition from exception stack to current process stack:
    exception stack pointer: ffff880028267e90
      process stack pointer: ffff88012cd7a048
         current stack base: ffff88012cd78000


Version-Release number of selected component (if applicable):

  crash-5.1.1-2.el6
  kernel-2.6.32-156.el6.x86_64

How reproducible:

  Very difficult -- NMI issued to non-crashing cpu must be received in
  a small window of opportunity.

Steps to Reproduce:
1. 
2.
3.
  
Actual results:

  As shown above.

Expected results:

  Backtrace should properly transition from the NMI stack back to
  the interrupted process stack.

Additional info:

  Reported by Paul Bunyan while kdump testing on 
  intel-piketon-tpm-01.lab.bos.redhat.com

https://beaker.engineering.redhat.com/jobs/95032

http://beaker-archive.app.eng.bos.redhat.com/beaker-logs/2011/06/950/95032/193998/2097628/9796461//test_log--kernel-kdump-analyse-crash.log

  I have a copy of the vmlinux/vmcore pair.

Comment 3 Dave Anderson 2011-06-10 18:14:33 UTC

The shutdown NMI has to be received by a non-crashing cpu
within a couple of instructions after having received an
interrupt from another source.  So it's highly unlikely
that it can be reproducible.

I have a fix for it -- the backtrace looks like this:

PID: 0      TASK: ffff88012cd74b00  CPU: 3   COMMAND: "swapper"
 #0 [ffff880028267e90] crash_nmi_callback at ffffffff81028a96
 #1 [ffff880028267ea0] notifier_call_chain at ffffffff814e13e5
 #2 [ffff880028267ee0] atomic_notifier_call_chain at ffffffff814e144a
 #3 [ffff880028267ef0] notify_die at ffffffff810942fe
 #4 [ffff880028267f20] do_nmi at ffffffff814df033
 #5 [ffff880028267f50] nmi at ffffffff814de940
    [exception RIP: irq_entries_start+296]
    RIP: ffffffff8100b728  RSP: ffff88012cd79e38  RFLAGS: 00000006
    RAX: 0000000000000000  RBX: 0000000000000004  RCX: 0000000000000000
    RDX: 00000000000000eb  RSI: 0000000000000000  RDI: 00000000000399dd
    RBP: ffff88012cd79ed8   R8: 0000000000000000   R9: 0000000000000320
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 00000000000000eb  R14: 0000000000000002  R15: 0000000000000003
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #6 [ffff88012cd79e38] irq_entries_start at ffffffff8100b728
 #7 [ffff88012cd79e60] intel_idle at ffffffff812bc2a1
 #8 [ffff88012cd79ee0] cpuidle_idle_call at ffffffff813ed4b7
 #9 [ffff88012cd79f00] cpu_idle at ffffffff81009de6
  
The non-crashing cpu was sitting idle, received an interrupt from
some source, but then immediately received a shutdown NMI from the
crashing cpu.

Comment 7 Tomas Capek 2011-10-18 15:02:07 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
In a rare scenario, a non-crashing CPU received a shutdown NMI (non-maskable interrupt) immediately after receiving an interrupt from another source. Because the IRQ entry-point symbols "IRQ0x00_interrupt" through "IRQ0x##_interrupt" no longer existed, the bt command terminated with the "bt: cannot transition from exception stack to current process stack" error message on AMD64 and Intel 64 architectures. This bug has been fixed, and backtrace now properly transitions from the NMI stack back to the interrupted process stack.

Comment 8 errata-xmlrpc 2011-12-06 16:30:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1648.html