Bug 2000221 - crash produces incorrect backtrace by using stale value
Summary: crash produces incorrect backtrace by using stale value
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: crash
Version: 8.4
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: lijiang
QA Contact: xiaoying yan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-01 15:32 UTC by John Pittman
Modified: 2023-03-28 08:23 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-95905 0 None None None 2021-09-01 15:34:12 UTC

Description John Pittman 2021-09-01 15:32:45 UTC
Description of problem:

It looks like the backtracer is potentially mishandling part of the stack.

crash> bt 17813
PID: 17813  TASK: ffff9f8c438f9080  CPU: 3   COMMAND: "jsivm"
 #0 [ffff9f8e3f2c8e48] crash_nmi_callback at ffffffffa4858567
 #1 [ffff9f8e3f2c8e58] nmi_handle at ffffffffa4f8d93c
 #2 [ffff9f8e3f2c8eb0] do_nmi at ffffffffa4f8db5d
 #3 [ffff9f8e3f2c8ef0] end_repeat_nmi at ffffffffa4f8cd9c
    [exception RIP: prune_super+9]
    RIP: ffffffffa4a50869  RSP: ffff9f70347a3868  RFLAGS: 00200206
    RAX: ffffffffa4a50860  RBX: 0000000000000400  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff9f70347a3998  RDI: ffff9f6fc5fe4bb0
    RBP: ffff9f70347a3868   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: fff00000f0000000  R12: 000000000000001d
    R13: 0000000000000000  R14: ffff9f6fc5fe4bb0  R15: ffff9f70347a3998
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
--- <NMI exception stack> ---
 #4 [ffff9f70347a3868] prune_super at ffffffffa4a50869
 #5 [ffff9f70347a3890] vmpressure at ffffffffa4a42b71
 #6 [ffff9f70347a3918] do_try_to_free_pages at ffffffffa49d4d8a
 #7 [ffff9f70347a3990] try_to_free_pages at ffffffffa49d4fdc
 #8 [ffff9f70347a3a28] free_more_memory at ffffffffa4a85d5e
 #9 [ffff9f70347a3a60] __getblk at ffffffffa4a870cb
#10 [ffff9f70347a3ac0] __breadahead at ffffffffa4a88b71
#11 [ffff9f70347a3ae0] __ext4_get_inode_loc at ffffffffc04b63fa [ext4]
#12 [ffff9f70347a3b48] ext4_get_inode_loc at ffffffffc04b894d [ext4]
#13 [ffff9f70347a3b58] ext4_reserve_inode_write at ffffffffc04ba716 [ext4]
#14 [ffff9f70347a3b88] ext4_mark_inode_dirty at ffffffffc04ba7e3 [ext4]
#15 [ffff9f70347a3be8] ext4_dirty_inode at ffffffffc04be0d0 [ext4]
#16 [ffff9f70347a3c08] __mark_inode_dirty at ffffffffa4a7f2bd
#17 [ffff9f70347a3c40] update_time at ffffffffa4a6bb99
#18 [ffff9f70347a3c70] file_update_time at ffffffffa4a6bc80
#19 [ffff9f70347a3cb0] ext4_page_mkwrite at ffffffffc04be30d [ext4]
#20 [ffff9f70347a3d08] do_page_mkwrite at ffffffffa49ee18a
#21 [ffff9f70347a3d88] do_shared_fault at ffffffffa49ee826
#22 [ffff9f70347a3de8] handle_mm_fault at ffffffffa49f58b3
#23 [ffff9f70347a3eb0] __do_page_fault at ffffffffa4f90653
#24 [ffff9f70347a3f20] do_page_fault at ffffffffa4f90975
#25 [ffff9f70347a3f50] page_fault at ffffffffa4f8c778
    RIP: 00000000ecaf1e7d  RSP: 00000000a9bfe290  RFLAGS: 00210286
    RAX: 0000000002eeec30  RBX: 00000000ebf05a10  RCX: 00000000aaa00000
    RDX: 0000000000000000  RSI: 000000000e2b8188  RDI: 0000000000000000
    RBP: 00000000a9bfe298   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0023  SS: 002b

In this stack, there shouldn't be a call to vmpressure.  It's a stale value.  It should be a call to shrink_slab, which does have a value on the stack.  (maybe crash was confused by catching prune_super while still doing initial pushes to stack?)  So this part of the backtrace should have looked like:

    [exception RIP: prune_super+9]
    RIP: ffffffffa4a50869  RSP: ffff9f70347a3868  RFLAGS: 00200206
    RAX: ffffffffa4a50860  RBX: 0000000000000400  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff9f70347a3998  RDI: ffff9f6fc5fe4bb0
    RBP: ffff9f70347a3868   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: fff00000f0000000  R12: 000000000000001d
    R13: 0000000000000000  R14: ffff9f6fc5fe4bb0  R15: ffff9f70347a3998
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
--- <NMI exception stack> ---
 #4 [ffff9f70347a3868] prune_super at ffffffffa4a50869
 #5 [ffff9f70347a3870] shrink_slab at ffffffffa49d19de   #<---- shrink_slab, not vmpressure
 #6 [ffff9f70347a3918] do_try_to_free_pages at ffffffffa49d4d8a
 #7 [ffff9f70347a3990] try_to_free_pages at ffffffffa49d4fdc
 #8 [ffff9f70347a3a28] free_more_memory at ffffffffa4a85d5e
...

Above data provided by djeffery.

Version-Release number of selected component (if applicable):

crash-7.2.8-7.el8.x86_64

Issue also reproducible with latest upstream. Expectation is for correct translation.

Comment 2 lijiang 2021-09-08 07:44:51 UTC
> In this stack, there shouldn't be a call to vmpressure.  It's a stale value.
> It should be a call to shrink_slab, which does have a value on the stack. 
> (maybe crash was confused by catching prune_super while still doing initial
> pushes to stack?)  So this part of the backtrace should have looked like:
> 

The vmpressure() could be invoked from the reclaim path.

How did you confirm that it shouldn't be a call to vmpressure()? Could you please provide more details?

Thanks.

Comment 3 John Pittman 2021-09-08 14:09:17 UTC
Hi Lijiang, we can see the correct sequence in the assembly:

crash> dis -rl ffffffffa49d4d8a
0xffffffffa49d49c0 <do_try_to_free_pages>:	nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffa49d49c5 <do_try_to_free_pages+5>:	push   %rbp
0xffffffffa49d49c6 <do_try_to_free_pages+6>:	mov    %rsp,%rbp
0xffffffffa49d49c9 <do_try_to_free_pages+9>:	push   %r15
0xffffffffa49d49cb <do_try_to_free_pages+11>:	mov    %rsi,%r15
0xffffffffa49d49ce <do_try_to_free_pages+14>:	push   %r14
....
usr/src/debug/kernel-3.10.0-1160.24.1.el7/linux-3.10.0-1160.24.1.el7.x86_64/mm/vmscan.c: 2501
0xffffffffa49d4d63 <do_try_to_free_pages+931>:	callq  0xffffffffa49de760 <next_zones_zonelist>
0xffffffffa49d4d68 <do_try_to_free_pages+936>:	mov    %rax,%r13
0xffffffffa49d4d6b <do_try_to_free_pages+939>:	mov    -0x48(%rbp),%rax
0xffffffffa49d4d6f <do_try_to_free_pages+943>:	test   %rax,%rax
0xffffffffa49d4d72 <do_try_to_free_pages+946>:	jne    0xffffffffa49d4d20 <do_try_to_free_pages+864>
0xffffffffa49d4d74 <do_try_to_free_pages+948>:	mov    %r12,%rax
0xffffffffa49d4d77 <do_try_to_free_pages+951>:	mov    -0x58(%rbp),%r12
/usr/src/debug/kernel-3.10.0-1160.24.1.el7/linux-3.10.0-1160.24.1.el7.x86_64/mm/vmscan.c: 2509
0xffffffffa49d4d7b <do_try_to_free_pages+955>:	mov    (%r15),%rsi
0xffffffffa49d4d7e <do_try_to_free_pages+958>:	mov    -0x68(%rbp),%rdi
0xffffffffa49d4d82 <do_try_to_free_pages+962>:	mov    %rax,%rdx
0xffffffffa49d4d85 <do_try_to_free_pages+965>:	callq  0xffffffffa49d1930 <shrink_slab>   <=======
/usr/src/debug/kernel-3.10.0-1160.24.1.el7/linux-3.10.0-1160.24.1.el7.x86_64/mm/vmscan.c: 2510
0xffffffffa49d4d8a <do_try_to_free_pages+970>:	mov    -0x60(%rbp),%rdx

2509             shrink_slab(shrink, sc->nr_scanned, lru_pages);
2510             if (reclaim_state) {
2511                 sc->nr_reclaimed += reclaim_state->reclaimed_slab;
2512                 reclaim_state->reclaimed_slab = 0;
2513             }

Comment 4 David Jeffery 2021-09-08 15:06:53 UTC
In addition to the return address for do_try_to_free_pages showing shrink_slab should have been called, vmpressure can't end up in a shrink callback like prune_super without something like a hardware error.  The return address was for calling memcg_to_vmpressure, which is a very small function which doesn't call anything else.

0xffffffffa4a42b6c <vmpressure+28>:     callq  0xffffffffa4a3ca50 <memcg_to_vmpressure>
0xffffffffa4a42b71 <vmpressure+33>:     and    $0xca,%r12d
0xffffffffa4a42b78 <vmpressure+40>:     je     0xffffffffa4a42b7f <vmpressure+47>


Looking at the raw stack relative from the exception frame's RSP to where prune_super was interrupted:

0xffffffffa4a50860 <prune_super>:	nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffa4a50865 <prune_super+5>:	push   %rbp
0xffffffffa4a50866 <prune_super+6>:	mov    %rsp,%rbp
0xffffffffa4a50869 <prune_super+9>:	push   %r14

crash> rd ffff9f70347a3868 -s 16
ffff9f70347a3868:  ffff9f70347a3910 shrink_slab+174  
ffff9f70347a3878:  ffff9f703879b800 00000050017a3878 
ffff9f70347a3888:  ffff9f70347a38b8 vmpressure+33    
ffff9f70347a3898:  000000000000101e 0000000000000080 
ffff9f70347a38a8:  ffff9f6fc53df0f0 000000000000003a 
ffff9f70347a38b8:  0000000000000000 0000000000000000 
ffff9f70347a38c8:  000000000000101d 0000000000000020 
ffff9f70347a38d8:  000000000000001d 000000009e5dda76 

prune_super had only executed a single push to modify the stack pointer by 8 bytes.  This aligns with the location of the shrink_slab return address.  The vmpressure return address would need 40 bytes worth of pushes or subtractions to RSP to be the return point for prune_super.


And the values saved on the stack match shrink_slab(). e.g. The scan_control address ffff9f70347a39a8 was used by do_try_to_free_pages() in its register R15.  It is preserved on the stack at a location as expected with a call to shrink_slab() using "push %r15".  If the vmpressure backtrace was correct, the scan_control address should be in the exception frame's R15, which it is not.


So there are multiple parts of the backtrace which align and produce sane results if shrink_slab() is part of the backtrace instead of vmpressure().  vmpressure() was a stale value and does not appear to belong.

Comment 5 lijiang 2021-09-14 11:07:16 UTC
Thank you for the explanation in detail, John and David.

But anyway, I don't think that it is a bug, because some stack data could be contained in a frame, for this case, the "bt -f/-F" command can display all stack data contained in a frame. Would you mind trying it?

crash> bt -f 17813
PID: 17813  TASK: ffff9f8c438f9080  CPU: 3   COMMAND: "jsivm"
 #0 [ffff9f8e3f2c8e48] crash_nmi_callback at ffffffffa4858567
    ffff9f8e3f2c8e50: ffff9f8e3f2c8ea8 ffffffffa4f8d93c 
 #1 [ffff9f8e3f2c8e58] nmi_handle at ffffffffa4f8d93c
    ffff9f8e3f2c8e60: 0000000000000000 ffff9f8e3f2c8ef8 
    ffff9f8e3f2c8e70: ffffffffa54332e0 000000009e5dda76 
    ffff9f8e3f2c8e80: ffff9f8e3f2c8ef8 ffff9f70347a3fd8 
    ffff9f8e3f2c8e90: 0000000000000000 ffff9f70347a3fd8 
    ffff9f8e3f2c8ea0: 00000000ffffffff ffff9f8e3f2c8ee8 
    ffff9f8e3f2c8eb0: ffffffffa4f8db5d 
 #2 [ffff9f8e3f2c8eb0] do_nmi at ffffffffa4f8db5d
    ffff9f8e3f2c8eb8: 0000000000000000 0000000000000001 
    ffff9f8e3f2c8ec8: 00007f445e85d198 00000000ffffffff 
    ffff9f8e3f2c8ed8: 00000016ed1d4000 ffff9f70347a3998 
    ffff9f8e3f2c8ee8: ffff9f70347a3868 ffffffffa4f8cd9c 
 #3 [ffff9f8e3f2c8ef0] end_repeat_nmi at ffffffffa4f8cd9c
    [exception RIP: prune_super+9]
    RIP: ffffffffa4a50869  RSP: ffff9f70347a3868  RFLAGS: 00200206
    RAX: ffffffffa4a50860  RBX: 0000000000000400  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff9f70347a3998  RDI: ffff9f6fc5fe4bb0
    RBP: ffff9f70347a3868   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: fff00000f0000000  R12: 000000000000001d
    R13: 0000000000000000  R14: ffff9f6fc5fe4bb0  R15: ffff9f70347a3998
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
--- <NMI exception stack> ---
 #4 [ffff9f70347a3868] prune_super at ffffffffa4a50869
    ffff9f70347a3870: ffffffffa49d19de ffff9f703879b800 
    ffff9f70347a3880: 00000050017a3878 ffff9f70347a38b8 
    ffff9f70347a3890: ffffffffa4a42b71 
...

crash> bt -F 17813
PID: 17813  TASK: ffff9f8c438f9080  CPU: 3   COMMAND: "jsivm"
 #0 [ffff9f8e3f2c8e48] crash_nmi_callback at ffffffffa4858567
    ffff9f8e3f2c8e50: ffff9f8e3f2c8ea8 nmi_handle+140   
 #1 [ffff9f8e3f2c8e58] nmi_handle at ffffffffa4f8d93c
    ffff9f8e3f2c8e60: 0000000000000000 ffff9f8e3f2c8ef8 
    ffff9f8e3f2c8e70: crash_nmi_callback_na.35510 000000009e5dda76 
    ffff9f8e3f2c8e80: ffff9f8e3f2c8ef8 ffff9f70347a3fd8 
    ffff9f8e3f2c8e90: 0000000000000000 ffff9f70347a3fd8 
    ffff9f8e3f2c8ea0: 00000000ffffffff ffff9f8e3f2c8ee8 
    ffff9f8e3f2c8eb0: do_nmi+349       
 #2 [ffff9f8e3f2c8eb0] do_nmi at ffffffffa4f8db5d
    ffff9f8e3f2c8eb8: 0000000000000000 0000000000000001 
    ffff9f8e3f2c8ec8: 00007f445e85d198 00000000ffffffff 
    ffff9f8e3f2c8ed8: 00000016ed1d4000 ffff9f70347a3998 
    ffff9f8e3f2c8ee8: ffff9f70347a3868 end_repeat_nmi+30 
 #3 [ffff9f8e3f2c8ef0] end_repeat_nmi at ffffffffa4f8cd9c
    [exception RIP: prune_super+9]
    RIP: ffffffffa4a50869  RSP: ffff9f70347a3868  RFLAGS: 00200206
    RAX: ffffffffa4a50860  RBX: 0000000000000400  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffff9f70347a3998  RDI: ffff9f6fc5fe4bb0
    RBP: ffff9f70347a3868   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: fff00000f0000000  R12: 000000000000001d
    R13: 0000000000000000  R14: ffff9f6fc5fe4bb0  R15: ffff9f70347a3998
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
--- <NMI exception stack> ---
 #4 [ffff9f70347a3868] prune_super at ffffffffa4a50869
    ffff9f70347a3870: shrink_slab+174  [kmalloc-1024]   
    ffff9f70347a3880: 00000050017a3878 ffff9f70347a38b8 
    ffff9f70347a3890: vmpressure+33    
 #5 [ffff9f70347a3890] vmpressure at ffffffffa4a42b71
...

Comment 6 John Pittman 2021-09-22 19:07:53 UTC
Thanks again for looking lijiang.  I see the same thing as you do, but I don't understand why it is not a bug.  Based on comment 4, vmpressure is incorrect.  But we still see it both in bt -f and bt -F.  Could you comment specifically to the data provided in comment 4?


Note You need to log in before you can comment on or make changes to this bug.