crash> bt 992 PID: 992 TASK: ffff810eb00cd7e0 CPU: 27 COMMAND: "java" #0 [ffff810e37150f20] crash_nmi_callback at ffffffff8007af27 #1 [ffff810e37150f40] do_nmi at ffffffff8006588a #2 [ffff810e37150f50] nmi at ffffffff80064eef [exception RIP: system_call] RIP: ffffffff8005d098 RSP: 00000000431dfc58 RFLAGS: 00000003 RAX: 0000000000000018 RBX: 0000000040dd3b60 RCX: 00000032440baa27 RDX: 00002aaad007dc88 RSI: 00002aaad007a130 RDI: 0000000000000000 RBP: 00000000431dfc60 R8: 0000000000000020 R9: 00000000000006af R10: 00000000000003e0 R11: 0000000000000203 R12: 000000000000015e R13: 0000000040dd3b70 R14: 00000000431dfca8 R15: 00000000431dfcb4 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 bt: WARNING: possibly bogus exception frame --- <NMI exception stack> --- #3 [431dfc58] system_call at ffffffff8005d098 bt: cannot transition from exception stack to current process stack: exception stack pointer: ffff810e37150f20 process stack pointer: 431dfc58 current stack base: ffff810ecd8ac000 Analysis from Dave Anderson: That's a new crash bug. The exception frame shows that the cpu was in the system_call routine, but has not a chance to switch the RSP from the user-space address stack to the kernel stack. I had recently put a fix into 5.0.5 fix for just that scenario, but it depended upon this function returning TRUE: int in_user_stack(ulong task, ulong vaddr) { ulong vma, vm_flags; char *vma_buf; if ((vma = vm_area_dump(task, UVADDR|VERIFY_ADDR, vaddr, 0))) { vma_buf = fill_vma_cache(vma); vm_flags = SIZE(vm_area_struct_vm_flags) == sizeof(short) ? USHORT(vma_buf+ OFFSET(vm_area_struct_vm_flags)) : ULONG(vma_buf+ OFFSET(vm_area_struct_vm_flags)); if (vm_flags & (VM_GROWSUP|VM_GROWSDOWN)) return TRUE; } return FALSE; } But that java task is apparently running in a per-thread-stack instead of the "normal" process stack at the top of memory: crash> vm 992 PID: 992 TASK: ffff810eb00cd7e0 CPU: 27 COMMAND: "java" MM PGD RSS TOTAL_VM ffff810ec6e51c40 ffff810ece3cc000 1130452k 6829724k VMA START END FLAGS FILE ... ffff810ecb9df088 430e1000 431e1000 100077 ... ffff810ecbe73768 7fffec321000 7fffec51e000 100177 crash> And annoyingly enough, those per-thread stacks don't set the GROWSUP/GROWSDOWN flag in their vma_area_struct's vm_flags: crash> vm -f 100077 100077: (READ|WRITE|EXEC|MAYREAD|MAYWRITE|MAYEXEC|WRITECOMBINED|ACCOUNT) crash> vm -f 100177 100177: (READ|WRITE|EXEC|MAYREAD|MAYWRITE|MAYEXEC|GROWSDOWN|WRITECOMBINED|ACCOUNT) crash> a vmcore exhibiting the problem is on megatron.gsslab.rdu.redhat.com in /cores/20100614175236/work
QA assist: Using the supplied vmlinux/anritsu-2023578-vmcore pair, crash version 4.1.2-5.el5 fails to make the transition from the NMI exception stack to the process stack if the interrupted, active, multi-threaded task had just entered the kernel but had not yet switched its stack pointer from the user-space per-thread stack to the process's kernel stack: # crash vmlinux anritsu-2023578-vmcore crash 4.1.2-5.el5 Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... please wait... (determining panic task) bt: cannot transition from exception stack to current process stack: exception stack pointer: ffff810e37150f20 process stack pointer: 431dfc58 current stack base: ffff810ecd8ac000 KERNEL: vmlinux DUMPFILE: anritsu-2023578-vmcore [PARTIAL DUMP] CPUS: 32 DATE: Mon Jun 14 16:25:59 2010 UPTIME: 2 days, 01:45:24 LOAD AVERAGE: 6.28, 3.21, 1.97 TASKS: 1497 NODENAME: qad-prod.us.anritsu.com RELEASE: 2.6.18-194.3.1.el5 VERSION: #1 SMP Sun May 2 04:17:42 EDT 2010 MACHINE: x86_64 (2400 Mhz) MEMORY: 63.1 GB PANIC: "Kernel panic - not syncing: softlockup: hung tasks" PID: 28942 COMMAND: "java" TASK: ffff810bbd90d7a0 [THREAD_INFO: ffff810be0888000] CPU: 6 STATE: TASK_RUNNING (PANIC) crash> bt 992 PID: 992 TASK: ffff810eb00cd7e0 CPU: 27 COMMAND: "java" #0 [ffff810e37150f20] crash_nmi_callback at ffffffff8007af27 #1 [ffff810e37150f40] do_nmi at ffffffff8006588a #2 [ffff810e37150f50] nmi at ffffffff80064eef [exception RIP: system_call] RIP: ffffffff8005d098 RSP: 00000000431dfc58 RFLAGS: 00000003 RAX: 0000000000000018 RBX: 0000000040dd3b60 RCX: 00000032440baa27 RDX: 00002aaad007dc88 RSI: 00002aaad007a130 RDI: 0000000000000000 RBP: 00000000431dfc60 R8: 0000000000000020 R9: 00000000000006af R10: 00000000000003e0 R11: 0000000000000203 R12: 000000000000015e R13: 0000000040dd3b70 R14: 00000000431dfca8 R15: 00000000431dfcb4 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 bt: WARNING: possibly bogus exception frame --- <NMI exception stack> --- #3 [431dfc58] system_call at ffffffff8005d098 bt: cannot transition from exception stack to current process stack: exception stack pointer: ffff810e37150f20 process stack pointer: 431dfc58 current stack base: ffff810ecd8ac000 crash> Crash version 4.1.2-6.el5 makes the stack transition correctly: # crash vmlinux anritsu-2023578-vmcore crash 4.1.2-6.el5 Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... KERNEL: vmlinux DUMPFILE: anritsu-2023578-vmcore [PARTIAL DUMP] CPUS: 32 DATE: Mon Jun 14 16:25:59 2010 UPTIME: 2 days, 01:45:24 LOAD AVERAGE: 6.28, 3.21, 1.97 TASKS: 1497 NODENAME: qad-prod.us.anritsu.com RELEASE: 2.6.18-194.3.1.el5 VERSION: #1 SMP Sun May 2 04:17:42 EDT 2010 MACHINE: x86_64 (2400 Mhz) MEMORY: 63.1 GB PANIC: "Kernel panic - not syncing: softlockup: hung tasks" PID: 28942 COMMAND: "java" TASK: ffff810bbd90d7a0 [THREAD_INFO: ffff810be0888000] CPU: 6 STATE: TASK_RUNNING (PANIC) crash> bt 992 PID: 992 TASK: ffff810eb00cd7e0 CPU: 27 COMMAND: "java" #0 [ffff810e37150f20] crash_nmi_callback at ffffffff8007af27 #1 [ffff810e37150f40] do_nmi at ffffffff8006588a #2 [ffff810e37150f50] nmi at ffffffff80064eef [exception RIP: system_call] RIP: ffffffff8005d098 RSP: 00000000431dfc58 RFLAGS: 00000003 RAX: 0000000000000018 RBX: 0000000040dd3b60 RCX: 00000032440baa27 RDX: 00002aaad007dc88 RSI: 00002aaad007a130 RDI: 0000000000000000 RBP: 00000000431dfc60 R8: 0000000000000020 R9: 00000000000006af R10: 00000000000003e0 R11: 0000000000000203 R12: 000000000000015e R13: 0000000040dd3b70 R14: 00000000431dfca8 R15: 00000000431dfcb4 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #3 [431dfc58] system_call at ffffffff8005d098 RIP: 00000032440baa27 RSP: 00000000431dfc58 RFLAGS: 00000203 RAX: 0000000000000000 RBX: 0000000040dd3b60 RCX: ffffffffffffffff RDX: 00002aaad007dc88 RSI: 00002aaad007a130 RDI: 0000000000000000 RBP: 00000000431dfc60 R8: 0000000000000020 R9: 00000000000006af R10: 00000000000003e0 R11: 0000000000000203 R12: 000000000000015d R13: 0000000040dd3b70 R14: 00000000431dfca8 R15: 00000000431dfcb4 ORIG_RAX: 0000000000000018 CS: 0033 SS: 002b crash>
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Prior to this update, the "bt" command failed to make the transition from the NMI exception stack to the process stack when a task had just entered the kernel, but had not switched its stack pointer from the user-space per-thread stack to the relevant kernel stack yet. This has been fixed, and such transition is made as expected.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0059.html