Bug 2058369

Summary: WARNING due to invalid error code from smb2_get_enc_key, followed by crash
Product: Red Hat Enterprise Linux 8 Reporter: Frank Sorenson <fsorenso>
Component: kernelAssignee: Ronnie Sahlberg <lsahlber>
kernel sub component: CIFS QA Contact: xiaoli feng <xifeng>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: ddouwsma, dostwal, dwysocha, lsahlber, mmilgram, xzhou
Version: 8.5Keywords: Patch, Triaged, ZStream
Target Milestone: rc   
Target Release: 8.7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-4.18.0-381.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2344658 (view as bug list) Environment:
Last Closed: 2022-11-08 10:22:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2344658    

Description Frank Sorenson 2022-02-24 18:28:06 UTC
Description of problem:

A kernel WARNING due to invalid error code returned by smb2_get_enc_key, followed quickly by a NULL pointer dereference.

The kernel warning matches a warning found and resolved upstream in the following commit:

commit 83728cbf366e334301091d5b808add468ab46b27
Author: Paul Aurich <paul>
Date:   2021-04-13 14:25:27 -0700

    cifs: Return correct error code from smb2_get_enc_key
    
    Avoid a warning if the error percolates back up:
    
    [440700.376476] CIFS VFS: \\otters.example.com crypt_message: Could not get encryption key
    [440700.386947] ------------[ cut here ]------------
    [440700.386948] err = 1
    [440700.386977] WARNING: CPU: 11 PID: 2733 at /build/linux-hwe-5.4-p6lk6L/linux-hwe-5.4-5.4.0/lib/errseq.c:74 errseq_set+0x5c/0x70
    ...
    [440700.397304] CPU: 11 PID: 2733 Comm: tar Tainted: G           OE     5.4.0-70-generic #78~18.04.1-Ubuntu
    ...
    [440700.397334] Call Trace:
    [440700.397346]  __filemap_set_wb_err+0x1a/0x70
    [440700.397419]  cifs_writepages+0x9c7/0xb30 [cifs]
    [440700.397426]  do_writepages+0x4b/0xe0
    [440700.397444]  __filemap_fdatawrite_range+0xcb/0x100
    [440700.397455]  filemap_write_and_wait+0x42/0xa0
    [440700.397486]  cifs_setattr+0x68b/0xf30 [cifs]
    [440700.397493]  notify_change+0x358/0x4a0
    [440700.397500]  utimes_common+0xe9/0x1c0
    [440700.397510]  do_utimes+0xc5/0x150
    [440700.397520]  __x64_sys_utimensat+0x88/0xd0
    
    Fixes: 61cfac6f267d ("CIFS: Fix possible use after free in demultiplex thread")
    Signed-off-by: Paul Aurich <paul>
    CC: stable.org
    Signed-off-by: Steve French <stfrench>


It is unclear whether the crash is directly related to the earlier warning, however it followed very shortly after the warning, and cannot be ruled out.

[735580.840999] ---[ end trace 06621dc5d043e510 ]---
[735581.250444] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
[735581.252976] PGD 0 P4D 0 
[735581.255018] Oops: 0000 [#1] SMP PTI
[735581.257608] CPU: 5 PID: 1567 Comm: cifsd Kdump: loaded Tainted: G        W        --------- -  - 4.18.0-348.7.1.el8_5.x86_64 #1

[735581.270029] RIP: 0010:smb2_writev_callback+0x49/0x3a0 [cifs]

[735581.328007]  ? kmem_cache_free+0x385/0x3b0
[735581.330245]  cifs_reconnect+0x324/0xe00 [cifs]
[735581.333315]  cifs_readv_from_socket+0x1ad/0x260 [cifs]
[735581.336015]  cifs_read_from_socket+0x4a/0x70 [cifs]
[735581.339023]  ? smb3_receive_transform+0x292/0x880 [cifs]
[735581.341024]  ? cifs_small_buf_get+0x16/0x20 [cifs]
[735581.344949]  ? allocate_buffers+0x66/0x120 [cifs]
[735581.346285]  cifs_demultiplex_thread+0xf6/0xc40 [cifs]
[735581.349015]  ? finish_task_switch+0xaa/0x2e0
[735581.353017]  ? cifs_handle_standard+0x190/0x190 [cifs]
[735581.355877]  kthread+0x116/0x130
[735581.357386]  ? kthread_flush_work_fn+0x10/0x10
[735581.360132]  ret_from_fork+0x35/0x40


Version-Release number of selected component (if applicable):

kernel 4.18.0-348.7.1.el8_5.x86_64


How reproducible:

unknown, but customer reports the crash has occurred twice.

Steps to Reproduce:

unknown


Actual results:

kernel warning and crash

Expected results:

no kernel warning or crash


Additional info:

Comment 2 Frank Sorenson 2022-02-24 19:50:02 UTC
vmcores from two kernel versions were provided by the customer

kernel 4.18.0-348.7.1.el8_5.x86_64
kernel 4.18.0-348.12.2.el8_5.x86_64

the WARNINGs are the same in both vmcores

	CIFS: VFS: \\server.example.com crypt_message: Could not get encryption key
	err = 1
	WARNING: CPU: 6 PID: 54199 at lib/errseq.c:74 errseq_set+0x5b/0x70

In both cases, the crash occurred very shortly after the warning (~0.5 seconds).  The RIPs in the vmcores are just one instruction from each other in smb2_writev_callback:

<smb2_writev_callback+0x49>: mov    0x98(%rax),%rax  << 4.18.0-348.7.1.el8_5.x86_64
<smb2_writev_callback+0x50>: mov    0x38(%rax),%r14  << 4.18.0-348.12.2.el8_5.x86_64

smb2_writev_callback(struct mid_q_entry *mid)
        struct cifs_writedata *wdata = mid->callback_data;
        struct cifs_tcon *tcon = tlink_tcon(wdata->cfile->tlink);


4.18.0-348.7.1.el8_5.x86_64 kernel
0xffffffffc0a89d4b <smb2_writev_callback+0x3b>: mov    0x80(%rbx),%rax
0xffffffffc0a89d52 <smb2_writev_callback+0x42>: mov    0xa8(%rbx),%r12
0xffffffffc0a89d59 <smb2_writev_callback+0x49>: mov    0x98(%rax),%rax

((struct cifs_writedata *)mid->callback_data)->cfile was zero


4.18.0-348.12.2.el8_5.x86_64 kernel
0xffffffffc088dd4b <smb2_writev_callback+0x3b>: mov    0x80(%rbx),%rax
0xffffffffc088dd52 <smb2_writev_callback+0x42>: mov    0xa8(%rbx),%r12
0xffffffffc088dd59 <smb2_writev_callback+0x49>: mov    0x98(%rax),%rax
0xffffffffc088dd60 <smb2_writev_callback+0x50>: mov    0x38(%rax),%r14

((struct cifs_writedata *)(mid->callback_data))->cfile->tlink was zero, which means that ->cfile was non-zero when the crash occurred.

However, examination of the vmcore indicates that ->cfile is now 0 as well, so it has apparently been modified by another task

Comment 13 errata-xmlrpc 2022-11-08 10:22:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7683