Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 608173 - [RHEL6] possibly bogus exception frame -- NMI while running on thread user stack
[RHEL6] possibly bogus exception frame -- NMI while running on thread user stack
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: crash (Show other bugs)
6.0
All Linux
low Severity medium
: rc
: ---
Assigned To: Dave Anderson
Chao Ye
:
Depends On: 608171
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-25 17:14 EDT by Guy Streeter
Modified: 2016-02-09 20:33 EST (History)
5 users (show)

See Also:
Fixed In Version: crash-5.0.0-20.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 608171
Environment:
Last Closed: 2010-11-10 15:04:21 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Guy Streeter 2010-06-25 17:14:03 EDT
+++ This bug was initially created as a clone of Bug #608171 +++

crash> bt 992
PID: 992    TASK: ffff810eb00cd7e0  CPU: 27  COMMAND: "java"
 #0 [ffff810e37150f20] crash_nmi_callback at ffffffff8007af27
 #1 [ffff810e37150f40] do_nmi at ffffffff8006588a
 #2 [ffff810e37150f50] nmi at ffffffff80064eef
    [exception RIP: system_call]
    RIP: ffffffff8005d098  RSP: 00000000431dfc58  RFLAGS: 00000003
    RAX: 0000000000000018  RBX: 0000000040dd3b60  RCX: 00000032440baa27
    RDX: 00002aaad007dc88  RSI: 00002aaad007a130  RDI: 0000000000000000
    RBP: 00000000431dfc60   R8: 0000000000000020   R9: 00000000000006af
    R10: 00000000000003e0  R11: 0000000000000203  R12: 000000000000015e
    R13: 0000000040dd3b70  R14: 00000000431dfca8  R15: 00000000431dfcb4
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
bt: WARNING: possibly bogus exception frame
--- <NMI exception stack> ---
 #3 [431dfc58] system_call at ffffffff8005d098
bt: cannot transition from exception stack to current process stack:
    exception stack pointer: ffff810e37150f20
      process stack pointer: 431dfc58
         current stack base: ffff810ecd8ac000


Analysis from Dave Anderson:

That's a new crash bug.  The exception frame shows that the cpu was in
the system_call routine, but has not a chance to switch the RSP from the
user-space address stack to the kernel stack.  I had recently put a fix
into 5.0.5 fix for just that scenario, but it depended upon this function
returning TRUE:
  
  int
  in_user_stack(ulong task, ulong vaddr)
  {
          ulong vma, vm_flags;
          char *vma_buf;
  
          if ((vma = vm_area_dump(task, UVADDR|VERIFY_ADDR, vaddr, 0))) {
                  vma_buf = fill_vma_cache(vma);
                  vm_flags = SIZE(vm_area_struct_vm_flags) == sizeof(short) ?
                          USHORT(vma_buf+ OFFSET(vm_area_struct_vm_flags)) :
                          ULONG(vma_buf+ OFFSET(vm_area_struct_vm_flags));
  
                  if (vm_flags & (VM_GROWSUP|VM_GROWSDOWN))
                          return TRUE;
          }
          return FALSE;
  }
  
But that java task is apparently running in a per-thread-stack instead of
the "normal" process stack at the top of memory:

  crash> vm 992
  PID: 992    TASK: ffff810eb00cd7e0  CPU: 27  COMMAND: "java"
         MM               PGD          RSS    TOTAL_VM
  ffff810ec6e51c40  ffff810ece3cc000  1130452k  6829724k
        VMA           START       END     FLAGS FILE
  ...
  ffff810ecb9df088   430e1000   431e1000 100077
  ...
  ffff810ecbe73768 7fffec321000 7fffec51e000 100177 
  crash>
  
And annoyingly enough, those per-thread stacks don't set the 
GROWSUP/GROWSDOWN flag in their vma_area_struct's vm_flags:

  crash> vm -f 100077
  100077: (READ|WRITE|EXEC|MAYREAD|MAYWRITE|MAYEXEC|WRITECOMBINED|ACCOUNT)
  crash> vm -f 100177
  100177: (READ|WRITE|EXEC|MAYREAD|MAYWRITE|MAYEXEC|GROWSDOWN|WRITECOMBINED|ACCOUNT)
  crash>

a vmcore exhibiting the problem is on
megatron.gsslab.rdu.redhat.com
in
/cores/20100614175236/work
Comment 1 RHEL Product and Program Management 2010-06-28 11:23:01 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 5 Chao Ye 2010-09-15 03:50:59 EDT
Reproduced with crash-5.0.0-18.el6:
================================================================================
[root@sgi-xe250-02 work]# rpm -q crash
crash-5.0.0-18.el6.x86_64
[root@sgi-xe250-02 work]# crash anritsu-2023578-vmcore vmlinux 

crash 5.0.0-18.el6
Copyright (C) 2002-2010  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

please wait... (determining panic task)         
bt: cannot transition from exception stack to current process stack:
    exception stack pointer: ffff810e37150f20
      process stack pointer: 431dfc58
         current stack base: ffff810ecd8ac000

      KERNEL: vmlinux                           
    DUMPFILE: anritsu-2023578-vmcore  [PARTIAL DUMP]
        CPUS: 32
        DATE: Mon Jun 14 16:25:59 2010
      UPTIME: 2 days, 01:45:24
LOAD AVERAGE: 6.28, 3.21, 1.97
       TASKS: 1497
    NODENAME: qad-prod.us.anritsu.com
     RELEASE: 2.6.18-194.3.1.el5
     VERSION: #1 SMP Sun May 2 04:17:42 EDT 2010
     MACHINE: x86_64  (2400 Mhz)
      MEMORY: 63.1 GB
       PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
         PID: 28942
     COMMAND: "java"
        TASK: ffff810bbd90d7a0  [THREAD_INFO: ffff810be0888000]
         CPU: 6
       STATE: TASK_RUNNING (PANIC)

crash> bt 992
PID: 992    TASK: ffff810eb00cd7e0  CPU: 27  COMMAND: "java"
 #0 [ffff810e37150f20] crash_nmi_callback at ffffffff8007af27
 #1 [ffff810e37150f40] do_nmi at ffffffff8006588a
 #2 [ffff810e37150f50] nmi at ffffffff80064eef
    [exception RIP: system_call]
    RIP: ffffffff8005d098  RSP: 00000000431dfc58  RFLAGS: 00000003
    RAX: 0000000000000018  RBX: 0000000040dd3b60  RCX: 00000032440baa27
    RDX: 00002aaad007dc88  RSI: 00002aaad007a130  RDI: 0000000000000000
    RBP: 00000000431dfc60   R8: 0000000000000020   R9: 00000000000006af
    R10: 00000000000003e0  R11: 0000000000000203  R12: 000000000000015e
    R13: 0000000040dd3b70  R14: 00000000431dfca8  R15: 00000000431dfcb4
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
bt: WARNING: possibly bogus exception frame
--- <NMI exception stack> ---
 #3 [431dfc58] system_call at ffffffff8005d098
bt: cannot transition from exception stack to current process stack:
    exception stack pointer: ffff810e37150f20
      process stack pointer: 431dfc58
         current stack base: ffff810ecd8ac000
crash> vm 992
PID: 992    TASK: ffff810eb00cd7e0  CPU: 27  COMMAND: "java"
       MM               PGD          RSS    TOTAL_VM
ffff810ec6e51c40  ffff810ece3cc000  1130452k  6829724k
      VMA           START       END     FLAGS FILE
ffff810ec7a73ad8   40000000   4000e000   1875 /usr/local/java/jdk1.5.0_19/bin/java
ffff810ec9f57818   4010d000   40110000 101877 /usr/local/java/jdk1.5.0_19/bin/java
......
crash> vm -f 100077
100077: (READ|WRITE|EXEC|MAYREAD|MAYWRITE|MAYEXEC|WRITECOMBINED|ACCOUNT)
crash> vm -f 100177
100177: (READ|WRITE|EXEC|MAYREAD|MAYWRITE|MAYEXEC|GROWSDOWN|WRITECOMBINED|ACCOUNT)



Verified with crash-5.0.0-23.el6:
================================================================================
[root@sgi-xe250-02 work]# rpm -q crash
crash-5.0.0-23.el6.x86_64
[root@sgi-xe250-02 work]# crash anritsu-2023578-vmcore vmlinux 

crash 5.0.0-23.el6
Copyright (C) 2002-2010  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: vmlinux                           
    DUMPFILE: anritsu-2023578-vmcore  [PARTIAL DUMP]
        CPUS: 32
        DATE: Mon Jun 14 16:25:59 2010
      UPTIME: 2 days, 01:45:24
LOAD AVERAGE: 6.28, 3.21, 1.97
       TASKS: 1497
    NODENAME: qad-prod.us.anritsu.com
     RELEASE: 2.6.18-194.3.1.el5
     VERSION: #1 SMP Sun May 2 04:17:42 EDT 2010
     MACHINE: x86_64  (2400 Mhz)
      MEMORY: 63.1 GB
       PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
         PID: 28942
     COMMAND: "java"
        TASK: ffff810bbd90d7a0  [THREAD_INFO: ffff810be0888000]
         CPU: 6
       STATE: TASK_RUNNING (PANIC)

crash> bt 992
PID: 992    TASK: ffff810eb00cd7e0  CPU: 27  COMMAND: "java"
 #0 [ffff810e37150f20] crash_nmi_callback at ffffffff8007af27
 #1 [ffff810e37150f40] do_nmi at ffffffff8006588a
 #2 [ffff810e37150f50] nmi at ffffffff80064eef
    [exception RIP: system_call]
    RIP: ffffffff8005d098  RSP: 00000000431dfc58  RFLAGS: 00000003
    RAX: 0000000000000018  RBX: 0000000040dd3b60  RCX: 00000032440baa27
    RDX: 00002aaad007dc88  RSI: 00002aaad007a130  RDI: 0000000000000000
    RBP: 00000000431dfc60   R8: 0000000000000020   R9: 00000000000006af
    R10: 00000000000003e0  R11: 0000000000000203  R12: 000000000000015e
    R13: 0000000040dd3b70  R14: 00000000431dfca8  R15: 00000000431dfcb4
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #3 [431dfc58] system_call at ffffffff8005d098
    RIP: 00000032440baa27  RSP: 00000000431dfc58  RFLAGS: 00000203
    RAX: 0000000000000000  RBX: 0000000040dd3b60  RCX: ffffffffffffffff
    RDX: 00002aaad007dc88  RSI: 00002aaad007a130  RDI: 0000000000000000
    RBP: 00000000431dfc60   R8: 0000000000000020   R9: 00000000000006af
    R10: 00000000000003e0  R11: 0000000000000203  R12: 000000000000015d
    R13: 0000000040dd3b70  R14: 00000000431dfca8  R15: 00000000431dfcb4
    ORIG_RAX: 0000000000000018  CS: 0033  SS: 002b
crash> vm 992
PID: 992    TASK: ffff810eb00cd7e0  CPU: 27  COMMAND: "java"
       MM               PGD          RSS    TOTAL_VM
ffff810ec6e51c40  ffff810ece3cc000  1130452k  6829724k
      VMA           START       END     FLAGS FILE
ffff810ec7a73ad8   40000000   4000e000   1875 /usr/local/java/jdk1.5.0_19/bin/java
ffff810ec9f57818   4010d000   40110000 101877 /usr/local/java/jdk1.5.0_19/bin/java
......
crash> vm -f 100077
100077: (READ|WRITE|EXEC|MAYREAD|MAYWRITE|MAYEXEC|WRITECOMBINED|ACCOUNT)
crash> vm -f 100177
100177: (READ|WRITE|EXEC|MAYREAD|MAYWRITE|MAYEXEC|GROWSDOWN|WRITECOMBINED|ACCOUNT)

================================================================================
Change status to VERIFIED.
Comment 6 releng-rhel@redhat.com 2010-11-10 15:04:21 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.