Description of problem: It looks like the backtracer is potentially mishandling part of the stack. crash> bt 17813 PID: 17813 TASK: ffff9f8c438f9080 CPU: 3 COMMAND: "jsivm" #0 [ffff9f8e3f2c8e48] crash_nmi_callback at ffffffffa4858567 #1 [ffff9f8e3f2c8e58] nmi_handle at ffffffffa4f8d93c #2 [ffff9f8e3f2c8eb0] do_nmi at ffffffffa4f8db5d #3 [ffff9f8e3f2c8ef0] end_repeat_nmi at ffffffffa4f8cd9c [exception RIP: prune_super+9] RIP: ffffffffa4a50869 RSP: ffff9f70347a3868 RFLAGS: 00200206 RAX: ffffffffa4a50860 RBX: 0000000000000400 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9f70347a3998 RDI: ffff9f6fc5fe4bb0 RBP: ffff9f70347a3868 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: fff00000f0000000 R12: 000000000000001d R13: 0000000000000000 R14: ffff9f6fc5fe4bb0 R15: ffff9f70347a3998 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 --- <NMI exception stack> --- #4 [ffff9f70347a3868] prune_super at ffffffffa4a50869 #5 [ffff9f70347a3890] vmpressure at ffffffffa4a42b71 #6 [ffff9f70347a3918] do_try_to_free_pages at ffffffffa49d4d8a #7 [ffff9f70347a3990] try_to_free_pages at ffffffffa49d4fdc #8 [ffff9f70347a3a28] free_more_memory at ffffffffa4a85d5e #9 [ffff9f70347a3a60] __getblk at ffffffffa4a870cb #10 [ffff9f70347a3ac0] __breadahead at ffffffffa4a88b71 #11 [ffff9f70347a3ae0] __ext4_get_inode_loc at ffffffffc04b63fa [ext4] #12 [ffff9f70347a3b48] ext4_get_inode_loc at ffffffffc04b894d [ext4] #13 [ffff9f70347a3b58] ext4_reserve_inode_write at ffffffffc04ba716 [ext4] #14 [ffff9f70347a3b88] ext4_mark_inode_dirty at ffffffffc04ba7e3 [ext4] #15 [ffff9f70347a3be8] ext4_dirty_inode at ffffffffc04be0d0 [ext4] #16 [ffff9f70347a3c08] __mark_inode_dirty at ffffffffa4a7f2bd #17 [ffff9f70347a3c40] update_time at ffffffffa4a6bb99 #18 [ffff9f70347a3c70] file_update_time at ffffffffa4a6bc80 #19 [ffff9f70347a3cb0] ext4_page_mkwrite at ffffffffc04be30d [ext4] #20 [ffff9f70347a3d08] do_page_mkwrite at ffffffffa49ee18a #21 [ffff9f70347a3d88] do_shared_fault at ffffffffa49ee826 #22 [ffff9f70347a3de8] handle_mm_fault at ffffffffa49f58b3 #23 [ffff9f70347a3eb0] __do_page_fault at ffffffffa4f90653 #24 [ffff9f70347a3f20] do_page_fault at ffffffffa4f90975 #25 [ffff9f70347a3f50] page_fault at ffffffffa4f8c778 RIP: 00000000ecaf1e7d RSP: 00000000a9bfe290 RFLAGS: 00210286 RAX: 0000000002eeec30 RBX: 00000000ebf05a10 RCX: 00000000aaa00000 RDX: 0000000000000000 RSI: 000000000e2b8188 RDI: 0000000000000000 RBP: 00000000a9bfe298 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0023 SS: 002b In this stack, there shouldn't be a call to vmpressure. It's a stale value. It should be a call to shrink_slab, which does have a value on the stack. (maybe crash was confused by catching prune_super while still doing initial pushes to stack?) So this part of the backtrace should have looked like: [exception RIP: prune_super+9] RIP: ffffffffa4a50869 RSP: ffff9f70347a3868 RFLAGS: 00200206 RAX: ffffffffa4a50860 RBX: 0000000000000400 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9f70347a3998 RDI: ffff9f6fc5fe4bb0 RBP: ffff9f70347a3868 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: fff00000f0000000 R12: 000000000000001d R13: 0000000000000000 R14: ffff9f6fc5fe4bb0 R15: ffff9f70347a3998 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 --- <NMI exception stack> --- #4 [ffff9f70347a3868] prune_super at ffffffffa4a50869 #5 [ffff9f70347a3870] shrink_slab at ffffffffa49d19de #<---- shrink_slab, not vmpressure #6 [ffff9f70347a3918] do_try_to_free_pages at ffffffffa49d4d8a #7 [ffff9f70347a3990] try_to_free_pages at ffffffffa49d4fdc #8 [ffff9f70347a3a28] free_more_memory at ffffffffa4a85d5e ... Above data provided by djeffery. Version-Release number of selected component (if applicable): crash-7.2.8-7.el8.x86_64 Issue also reproducible with latest upstream. Expectation is for correct translation.
> In this stack, there shouldn't be a call to vmpressure. It's a stale value. > It should be a call to shrink_slab, which does have a value on the stack. > (maybe crash was confused by catching prune_super while still doing initial > pushes to stack?) So this part of the backtrace should have looked like: > The vmpressure() could be invoked from the reclaim path. How did you confirm that it shouldn't be a call to vmpressure()? Could you please provide more details? Thanks.
Hi Lijiang, we can see the correct sequence in the assembly: crash> dis -rl ffffffffa49d4d8a 0xffffffffa49d49c0 <do_try_to_free_pages>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffffa49d49c5 <do_try_to_free_pages+5>: push %rbp 0xffffffffa49d49c6 <do_try_to_free_pages+6>: mov %rsp,%rbp 0xffffffffa49d49c9 <do_try_to_free_pages+9>: push %r15 0xffffffffa49d49cb <do_try_to_free_pages+11>: mov %rsi,%r15 0xffffffffa49d49ce <do_try_to_free_pages+14>: push %r14 .... usr/src/debug/kernel-3.10.0-1160.24.1.el7/linux-3.10.0-1160.24.1.el7.x86_64/mm/vmscan.c: 2501 0xffffffffa49d4d63 <do_try_to_free_pages+931>: callq 0xffffffffa49de760 <next_zones_zonelist> 0xffffffffa49d4d68 <do_try_to_free_pages+936>: mov %rax,%r13 0xffffffffa49d4d6b <do_try_to_free_pages+939>: mov -0x48(%rbp),%rax 0xffffffffa49d4d6f <do_try_to_free_pages+943>: test %rax,%rax 0xffffffffa49d4d72 <do_try_to_free_pages+946>: jne 0xffffffffa49d4d20 <do_try_to_free_pages+864> 0xffffffffa49d4d74 <do_try_to_free_pages+948>: mov %r12,%rax 0xffffffffa49d4d77 <do_try_to_free_pages+951>: mov -0x58(%rbp),%r12 /usr/src/debug/kernel-3.10.0-1160.24.1.el7/linux-3.10.0-1160.24.1.el7.x86_64/mm/vmscan.c: 2509 0xffffffffa49d4d7b <do_try_to_free_pages+955>: mov (%r15),%rsi 0xffffffffa49d4d7e <do_try_to_free_pages+958>: mov -0x68(%rbp),%rdi 0xffffffffa49d4d82 <do_try_to_free_pages+962>: mov %rax,%rdx 0xffffffffa49d4d85 <do_try_to_free_pages+965>: callq 0xffffffffa49d1930 <shrink_slab> <======= /usr/src/debug/kernel-3.10.0-1160.24.1.el7/linux-3.10.0-1160.24.1.el7.x86_64/mm/vmscan.c: 2510 0xffffffffa49d4d8a <do_try_to_free_pages+970>: mov -0x60(%rbp),%rdx 2509 shrink_slab(shrink, sc->nr_scanned, lru_pages); 2510 if (reclaim_state) { 2511 sc->nr_reclaimed += reclaim_state->reclaimed_slab; 2512 reclaim_state->reclaimed_slab = 0; 2513 }
In addition to the return address for do_try_to_free_pages showing shrink_slab should have been called, vmpressure can't end up in a shrink callback like prune_super without something like a hardware error. The return address was for calling memcg_to_vmpressure, which is a very small function which doesn't call anything else. 0xffffffffa4a42b6c <vmpressure+28>: callq 0xffffffffa4a3ca50 <memcg_to_vmpressure> 0xffffffffa4a42b71 <vmpressure+33>: and $0xca,%r12d 0xffffffffa4a42b78 <vmpressure+40>: je 0xffffffffa4a42b7f <vmpressure+47> Looking at the raw stack relative from the exception frame's RSP to where prune_super was interrupted: 0xffffffffa4a50860 <prune_super>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffffa4a50865 <prune_super+5>: push %rbp 0xffffffffa4a50866 <prune_super+6>: mov %rsp,%rbp 0xffffffffa4a50869 <prune_super+9>: push %r14 crash> rd ffff9f70347a3868 -s 16 ffff9f70347a3868: ffff9f70347a3910 shrink_slab+174 ffff9f70347a3878: ffff9f703879b800 00000050017a3878 ffff9f70347a3888: ffff9f70347a38b8 vmpressure+33 ffff9f70347a3898: 000000000000101e 0000000000000080 ffff9f70347a38a8: ffff9f6fc53df0f0 000000000000003a ffff9f70347a38b8: 0000000000000000 0000000000000000 ffff9f70347a38c8: 000000000000101d 0000000000000020 ffff9f70347a38d8: 000000000000001d 000000009e5dda76 prune_super had only executed a single push to modify the stack pointer by 8 bytes. This aligns with the location of the shrink_slab return address. The vmpressure return address would need 40 bytes worth of pushes or subtractions to RSP to be the return point for prune_super. And the values saved on the stack match shrink_slab(). e.g. The scan_control address ffff9f70347a39a8 was used by do_try_to_free_pages() in its register R15. It is preserved on the stack at a location as expected with a call to shrink_slab() using "push %r15". If the vmpressure backtrace was correct, the scan_control address should be in the exception frame's R15, which it is not. So there are multiple parts of the backtrace which align and produce sane results if shrink_slab() is part of the backtrace instead of vmpressure(). vmpressure() was a stale value and does not appear to belong.
Thank you for the explanation in detail, John and David. But anyway, I don't think that it is a bug, because some stack data could be contained in a frame, for this case, the "bt -f/-F" command can display all stack data contained in a frame. Would you mind trying it? crash> bt -f 17813 PID: 17813 TASK: ffff9f8c438f9080 CPU: 3 COMMAND: "jsivm" #0 [ffff9f8e3f2c8e48] crash_nmi_callback at ffffffffa4858567 ffff9f8e3f2c8e50: ffff9f8e3f2c8ea8 ffffffffa4f8d93c #1 [ffff9f8e3f2c8e58] nmi_handle at ffffffffa4f8d93c ffff9f8e3f2c8e60: 0000000000000000 ffff9f8e3f2c8ef8 ffff9f8e3f2c8e70: ffffffffa54332e0 000000009e5dda76 ffff9f8e3f2c8e80: ffff9f8e3f2c8ef8 ffff9f70347a3fd8 ffff9f8e3f2c8e90: 0000000000000000 ffff9f70347a3fd8 ffff9f8e3f2c8ea0: 00000000ffffffff ffff9f8e3f2c8ee8 ffff9f8e3f2c8eb0: ffffffffa4f8db5d #2 [ffff9f8e3f2c8eb0] do_nmi at ffffffffa4f8db5d ffff9f8e3f2c8eb8: 0000000000000000 0000000000000001 ffff9f8e3f2c8ec8: 00007f445e85d198 00000000ffffffff ffff9f8e3f2c8ed8: 00000016ed1d4000 ffff9f70347a3998 ffff9f8e3f2c8ee8: ffff9f70347a3868 ffffffffa4f8cd9c #3 [ffff9f8e3f2c8ef0] end_repeat_nmi at ffffffffa4f8cd9c [exception RIP: prune_super+9] RIP: ffffffffa4a50869 RSP: ffff9f70347a3868 RFLAGS: 00200206 RAX: ffffffffa4a50860 RBX: 0000000000000400 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9f70347a3998 RDI: ffff9f6fc5fe4bb0 RBP: ffff9f70347a3868 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: fff00000f0000000 R12: 000000000000001d R13: 0000000000000000 R14: ffff9f6fc5fe4bb0 R15: ffff9f70347a3998 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 --- <NMI exception stack> --- #4 [ffff9f70347a3868] prune_super at ffffffffa4a50869 ffff9f70347a3870: ffffffffa49d19de ffff9f703879b800 ffff9f70347a3880: 00000050017a3878 ffff9f70347a38b8 ffff9f70347a3890: ffffffffa4a42b71 ... crash> bt -F 17813 PID: 17813 TASK: ffff9f8c438f9080 CPU: 3 COMMAND: "jsivm" #0 [ffff9f8e3f2c8e48] crash_nmi_callback at ffffffffa4858567 ffff9f8e3f2c8e50: ffff9f8e3f2c8ea8 nmi_handle+140 #1 [ffff9f8e3f2c8e58] nmi_handle at ffffffffa4f8d93c ffff9f8e3f2c8e60: 0000000000000000 ffff9f8e3f2c8ef8 ffff9f8e3f2c8e70: crash_nmi_callback_na.35510 000000009e5dda76 ffff9f8e3f2c8e80: ffff9f8e3f2c8ef8 ffff9f70347a3fd8 ffff9f8e3f2c8e90: 0000000000000000 ffff9f70347a3fd8 ffff9f8e3f2c8ea0: 00000000ffffffff ffff9f8e3f2c8ee8 ffff9f8e3f2c8eb0: do_nmi+349 #2 [ffff9f8e3f2c8eb0] do_nmi at ffffffffa4f8db5d ffff9f8e3f2c8eb8: 0000000000000000 0000000000000001 ffff9f8e3f2c8ec8: 00007f445e85d198 00000000ffffffff ffff9f8e3f2c8ed8: 00000016ed1d4000 ffff9f70347a3998 ffff9f8e3f2c8ee8: ffff9f70347a3868 end_repeat_nmi+30 #3 [ffff9f8e3f2c8ef0] end_repeat_nmi at ffffffffa4f8cd9c [exception RIP: prune_super+9] RIP: ffffffffa4a50869 RSP: ffff9f70347a3868 RFLAGS: 00200206 RAX: ffffffffa4a50860 RBX: 0000000000000400 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff9f70347a3998 RDI: ffff9f6fc5fe4bb0 RBP: ffff9f70347a3868 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: fff00000f0000000 R12: 000000000000001d R13: 0000000000000000 R14: ffff9f6fc5fe4bb0 R15: ffff9f70347a3998 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 --- <NMI exception stack> --- #4 [ffff9f70347a3868] prune_super at ffffffffa4a50869 ffff9f70347a3870: shrink_slab+174 [kmalloc-1024] ffff9f70347a3880: 00000050017a3878 ffff9f70347a38b8 ffff9f70347a3890: vmpressure+33 #5 [ffff9f70347a3890] vmpressure at ffffffffa4a42b71 ...
Thanks again for looking lijiang. I see the same thing as you do, but I don't understand why it is not a bug. Based on comment 4, vmpressure is incorrect. But we still see it both in bt -f and bt -F. Could you comment specifically to the data provided in comment 4?