Description of problem: This panic was caused by systemd running off the top of the user stack. Unfortunately by the time a the OS does the kill on pid 1 a lot of things for it have been release and vars overwritten. But we can see from the segfault message for systemd logged to the ring buffer show that the faulting address was just above the RSP address an the error was created by a write from user mode out of libc-2.17.so . Version-Release number of selected component (if applicable): RELEASE: 3.10.0-1160.62.1.el7.x86_64 systemd-219-78.el7_9.5.x86_64 Additional info: systemd[1]: segfault at 7ffd0a6a9fc8 ip 00007f563537f68c sp 00007ffd0a6a9fd0 error 6 in libc-2.17.so[7f5635337000+1c4000] And from the dump we can see the bottom of the stack (bot) was 0x7ffd0aeaa000 and the stack limit was 8388608 (8MB). crash> vm PID: 1 TASK: ffff91e199b40000 CPU: 41 COMMAND: "systemd" MM PGD RSS TOTAL_VM 0 0 0k 0k crash> task -R active_mm PID: 1 TASK: ffff91e199b40000 CPU: 41 COMMAND: "systemd" active_mm = 0xffff91fc5f3addc0, crash> vm -M 0xffff91fc5f3addc0 | grep 7ffd0 ffff92007a48e360 0 7ffd0aeaa000 100173 <------- ffff91f0800aeaf8 ffff91f0800aea20 7ffd0aee9000 8040075 crash> crash> ps 1 -r | grep STACK STACK 8388608 (unlimited) crash> Adding the 8Mb to the bot gives us a tos limit 0x7ffd0a6aa000. Mapping: 7ffd0a6a9fc8 <- fault address 7ffd0a6a9fd0 <- current rsp 7ffd0a6aa000 <- tos limit 7ffd0aeaa000 <- bos And you can see that the faulting address was 1 qword/8 bytes above the current rsp. crash> pd 0x7ffd0a6a9fd0-0x7ffd0a6a9fc8 $27 = 8 So the most likely thing that happened was we called into a routine while below the tos limit, it set up it’s stack which allocated stack space which moved RSP above the tos limit, and then made a call which tried to push the return address one qword above the current RSP thus tripping the page fault. Chasing the object code for the library it shows that scenario. From a test system using gdb: gdb) disass 0x7fab4bb9e600 Dump of assembler code for function _IO_vfprintf_internal: The pushes happen while under the limit. 0x00007fab4bb9e600 <+0>: push %rbp 0x00007fab4bb9e601 <+1>: mov %rsp,%rbp 0x00007fab4bb9e604 <+4>: push %r15 0x00007fab4bb9e606 <+6>: mov %rdi,%r15 0x00007fab4bb9e609 <+9>: push %r14 0x00007fab4bb9e60b <+11>: mov %rsi,%r14 0x00007fab4bb9e60e <+14>: push %r13 0x00007fab4bb9e610 <+16>: mov %rdx,%r13 0x00007fab4bb9e613 <+19>: push %r12 0x00007fab4bb9e615 <+21>: push %rbx 0x00007fab4bb9e616 <+22>: sub $0x5c8,%rsp <-- pushed us 0x30 bytes over the tos limit So all the stack access are within the 0x598 bytes still inside of the tos limit. 0x00007fab4bb9e61d <+29>: mov 0x37e82c(%rip),%rax # 0x7fab4bf1ce50 0x00007fab4bb9e624 <+36>: mov %fs:(%rax),%eax 0x00007fab4bb9e627 <+39>: mov %eax,-0x4d8(%rbp) 0x00007fab4bb9e62d <+45>: mov 0xc0(%rdi),%eax 0x00007fab4bb9e633 <+51>: test %eax,%eax 0x00007fab4bb9e635 <+53>: jne 0x7fab4bb9e780 <_IO_vfprintf_internal+384> 0x00007fab4bb9e63b <+59>: movl $0xffffffff,0xc0(%rdi) 0x00007fab4bb9e645 <+69>: mov (%r15),%ebx 0x00007fab4bb9e648 <+72>: test $0x8,%bl 0x00007fab4bb9e64b <+75>: jne 0x7fab4bb9ea60 <_IO_vfprintf_internal+1120> 0x00007fab4bb9e651 <+81>: test %r14,%r14 0x00007fab4bb9e654 <+84>: je 0x7fab4bb9ea7e <_IO_vfprintf_internal+1150> 0x00007fab4bb9e65a <+90>: test $0x2,%bl 0x00007fab4bb9e65d <+93>: jne 0x7fab4bb9e810 <_IO_vfprintf_internal+528> 0x00007fab4bb9e663 <+99>: mov 0x0(%r13),%rax 0x00007fab4bb9e667 <+103>: mov $0x25,%esi 0x00007fab4bb9e66c <+108>: mov %r14,%rdi 0x00007fab4bb9e66f <+111>: mov %rax,-0x460(%rbp) 0x00007fab4bb9e676 <+118>: mov 0x8(%r13),%rax 0x00007fab4bb9e67a <+122>: mov %rax,-0x458(%rbp) 0x00007fab4bb9e681 <+129>: mov 0x10(%r13),%rax 0x00007fab4bb9e685 <+133>: mov %rax,-0x450(%rbp) This call is the first stack access outside of the tos limit which triggers the page fault. 0x00007fab4bb9e68c <+140>: callq 0x7fab4bbec4b0 <strchrnul> <-------- 0x00007fab4bb9e691 <+145>: and $0x80,%bh crash> bt PID: 1 TASK: ffff91e199b40000 CPU: 41 COMMAND: "systemd" #0 [ffff91e199b4bb30] machine_kexec at ffffffffba8662f4 #1 [ffff91e199b4bb90] __crash_kexec at ffffffffba922b82 #2 [ffff91e199b4bc60] panic at ffffffffbaf802bc #3 [ffff91e199b4bce0] do_exit at ffffffffba8a223f #4 [ffff91e199b4bd78] do_group_exit at ffffffffba8a22bf #5 [ffff91e199b4bda8] get_signal_to_deliver at ffffffffba8b341e #6 [ffff91e199b4be40] do_signal at ffffffffba82c527 #7 [ffff91e199b4bf30] do_notify_resume at ffffffffba82cc32 #8 [ffff91e199b4bf50] retint_signal at ffffffffbaf9057c RIP: 00007f563537f68c RSP: 00007ffd0a6a9fd0 RFLAGS: 00010246 RAX: 00007ffd0a6aa760 RBX: 00000000fbad8001 RCX: 0000000000000000 RDX: 00007ffd0a6aa748 RSI: 0000000000000025 RDI: 000055d811decd88 RBP: 00007ffd0a6aa5c0 R8: 0000000000000000 R9: 00007ffd0a6aa748 R10: 000055d811dee910 R11: 000055d811e0461b R12: 000055d811decd88 R13: 00007ffd0a6aa748 R14: 000055d811decd88 R15: 00007ffd0a6aa5d0 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b crash> We can see a left over page fault marker in the stack. crash> bt -T | grep -e panic -e page -C 2 bt: get_cpus_online: online: 48 [ffff91e199b4bb90] __crash_kexec at ffffffffba922b82 [ffff91e199b4bc18] __crash_kexec at ffffffffba922c1f [ffff91e199b4bc60] panic at ffffffffbaf802bc [ffff91e199b4bce0] do_exit at ffffffffba8a223f [ffff91e199b4bd78] do_group_exit at ffffffffba8a22bf -- [ffff91e199b4be40] do_signal at ffffffffba82c527 [ffff91e199b4be98] vprintk_default at ffffffffba89e669 [ffff91e199b4beb0] __do_page_fault at ffffffffbaf948b7 <-- [ffff91e199b4bf30] do_notify_resume at ffffffffba82cc32 [ffff91e199b4bf50] retint_signal at ffffffffbaf9057c crash> Which granted could just be really old data but it’s the page fault code which leads to the segfault message being logged. 1362 do_page_fault(struct pt_regs *regs, unsigned long error_code) 1363 { 1364 enum ctx_state prev_state; 1365 /* Get the faulting address: */ 1366 unsigned long address = read_cr2(); 1367 1368 prev_state = exception_enter(); 1369 __do_page_fault(regs, error_code, address); <-- 1141 static void __kprobes 1142 __do_page_fault(struct pt_regs *regs, unsigned long error_code, 1143 unsigned long address) 1144 { . . . 1294 if (unlikely(expand_stack(vma, address))) { <----- 1295 bad_area(regs, error_code, address); <--- 1296 return; 876 static noinline void 877 bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address) 878 { 879 __bad_area(regs, error_code, address, NULL, SEGV_MAPERR); <-- 880 } 862 __bad_area(struct pt_regs *regs, unsigned long error_code, 863 unsigned long address, struct vm_area_struct *vma, int si_code) 864 { 865 struct mm_struct *mm = current->mm; 866 867 /* 868 * Something tried to access memory that isn't in our memory map.. 869 * Fix it, but check if it's kernel or user first.. 870 */ 871 up_read(&mm->mmap_sem); 872 873 __bad_area_nosemaphore(regs, error_code, address, vma, si_code); <--- 798 __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, 799 unsigned long address, struct vm_area_struct *vma, 800 int si_code) 801 { . . . 836 if (likely(show_unhandled_signals)) 837 show_signal_msg(regs, error_code, address, tsk); <--- 838 839 tsk->thread.cr2 = address; 840 tsk->thread.error_code = error_code; 841 tsk->thread.trap_nr = X86_TRAP_PF; 842 778 show_signal_msg(struct pt_regs *regs, unsigned long error_code, 779 unsigned long address, struct task_struct *tsk) 780 { 781 if (!unhandled_signal(tsk, SIGSEGV)) 782 return; 783 784 if (!printk_ratelimit()) 785 return; 786 787 printk("%s%s[%d]: segfault at %lx ip %p sp %p error %lx", <---- 788 task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG, 789 tsk->comm, task_pid_nr(tsk), address, 790 (void *)regs->ip, (void *)regs->sp, error_code); And to confirm it. crash> task | grep cr2 -A 2 cr2 = 0x7ffd0a6a9fc8, <- matches the faulting address in the message trap_nr = 0xe, error_code = 0x6, <- matches the error crash> As user memory was not dumped we cannot see the user stack.
The dump is located on galvatron-x86.cee.ral3.lab.eng.rdu2.redhat.com retrace-server-interact 547813400 crash crash> sys KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-1160.62.1.el7.x86_64/vmlinux [TAINTED] DUMPFILE: /cores/retrace/tasks/547813400/crash/vmcore [PARTIAL DUMP] CPUS: 48 DATE: Sun Apr 30 15:54:51 EDT 2023 UPTIME: 80 days, 20:40:47 LOAD AVERAGE: 13.76, 11.69, 11.62 TASKS: 16977 NODENAME: RELEASE: 3.10.0-1160.62.1.el7.x86_64 VERSION: #1 SMP Wed Mar 23 09:04:02 UTC 2022 MACHINE: x86_64 (2294 Mhz) MEMORY: 127.7 GB PANIC: "Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b" crash> sys -i DMI_BIOS_VENDOR: Cisco Systems, Inc. DMI_BIOS_VERSION: B200M4.3.1.3k.0.022620181855 DMI_BIOS_DATE: 02/26/2018 DMI_SYS_VENDOR: Cisco Systems Inc DMI_PRODUCT_NAME: UCSB-B200-M4 DMI_PRODUCT_VERSION: 1 DMI_PRODUCT_SERIAL: DMI_PRODUCT_UUID: DMI_BOARD_VENDOR: Cisco Systems Inc DMI_BOARD_NAME: UCSB-B200-M4 DMI_BOARD_VERSION: 73-15862-03 DMI_BOARD_SERIAL: DMI_BOARD_ASSET_TAG: DMI_CHASSIS_VENDOR: Cisco Systems Inc DMI_CHASSIS_TYPE: 18 DMI_CHASSIS_VERSION: 68-5091-04 DMI_CHASSIS_SERIAL: DMI_CHASSIS_ASSET_TAG: crash>