Bug 1702264
| Summary: | panic handing smb2_reconnect due to a use after free | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Gonzalo Siero <gsierohu> | |
| Component: | kernel | Assignee: | cifs-maint | |
| kernel sub component: | CIFS | QA Contact: | JianHong Yin <jiyin> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | high | |||
| Priority: | urgent | CC: | dhoward, dwysocha, fsorenso, jaeshin, jshivers, jstancek, lsahlber, nmurray, plambri, rbergant, swhiteho, xifeng, xzhou | |
| Version: | 7.6 | Keywords: | Regression, Reproducer, ZStream | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | kernel-3.10.0-1063.el7 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1737381 1737382 (view as bug list) | Environment: | ||
| Last Closed: | 2020-03-31 19:17:07 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1711360, 1737381, 1737382 | |||
| Attachments: | ||||
Created attachment 1566219 [details]
stap script to reproduce the panic
Indeed... This does appear to hit the bug. May 9 17:35:59 vm6 kernel: __smb_send_rqst: 1 callbacks suppressed May 9 17:35:59 vm6 kernel: cifs_demultiplex_thread: 1 callbacks suppressed May 9 17:35:59 vm6 kernel: smb2_check_message: 1 callbacks suppressed May 9 17:35:59 vm6 kernel: smb2_calc_size: 3 callbacks suppressed May 9 17:35:59 vm6 kernel: cifs_sync_mid_result: 1 callbacks suppressed May 9 17:35:59 vm6 kernel: CIFS VFS: Free previous auth_key.response = ffff88005a99c668 May 9 17:35:59 vm6 kernel: ================================================================== May 9 17:35:59 vm6 kernel: BUG: KASAN: use-after-free in SMB2_sess_auth_rawntlmssp_negotiate+0x9bc/0x9d0 [cifs] May 9 17:35:59 vm6 kernel: Write of size 8 at addr ffff88005d658e98 by task kworker/0:2/20118 May 9 17:35:59 vm6 kernel: May 9 17:35:59 vm6 kernel: CPU: 0 PID: 20118 Comm: kworker/0:2 Kdump: loaded Tainted: G OE ------------ 3.10.0-1006.el7.kasan.x86_64 #1 May 9 17:35:59 vm6 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 May 9 17:35:59 vm6 kernel: Workqueue: cifsiod smb2_reconnect_server [cifs] May 9 17:35:59 vm6 kernel: Call Trace: May 9 17:35:59 vm6 kernel: [<ffffffffa4902043>] dump_stack+0x19/0x1b May 9 17:35:59 vm6 kernel: [<ffffffffa3697b8c>] print_address_description+0xfc/0x290 May 9 17:35:59 vm6 kernel: [<ffffffffc13a4d0c>] ? SMB2_sess_auth_rawntlmssp_negotiate+0x9bc/0x9d0 [cifs] May 9 17:35:59 vm6 kernel: [<ffffffffa3697f62>] kasan_report.part.3+0x242/0x330 May 9 17:35:59 vm6 kernel: [<ffffffffa3698287>] __asan_report_store8_noabort+0x37/0x40 May 9 17:35:59 vm6 kernel: [<ffffffffc13a4d0c>] SMB2_sess_auth_rawntlmssp_negotiate+0x9bc/0x9d0 [cifs] May 9 17:35:59 vm6 kernel: [<ffffffffc139eb73>] SMB2_sess_setup+0x273/0x5a0 [cifs] May 9 17:35:59 vm6 kernel: [<ffffffffc1301e87>] cifs_setup_session+0x2a7/0x4a0 [cifs] May 9 17:35:59 vm6 kernel: [<ffffffffc1301b0c>] ? cifs_negotiate_protocol+0x1ac/0x280 [cifs] May 9 17:35:59 vm6 kernel: [<ffffffffc13a092c>] smb2_reconnect+0x79c/0xbb0 [cifs] May 9 17:35:59 vm6 kernel: [<ffffffffa493a1e4>] ? ftrace_regs_call+0x5/0x81 May 9 17:36:00 vm6 kernel: [<ffffffffc13a0190>] ? SMB2_tcon+0x12f0/0x12f0 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffa491cdc9>] kretprobe_trampoline_holder+0x9/0x9 May 9 17:36:00 vm6 kernel: [<ffffffffc13b09d0>] ? SMB2_get_srv_num+0x50/0x50 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc13b09d0>] ? SMB2_get_srv_num+0x50/0x50 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffa491cdc9>] kretprobe_trampoline_holder+0x9/0x9 May 9 17:36:00 vm6 kernel: [<ffffffffa323b3d0>] worker_thread+0x3c0/0xb60 May 9 17:36:00 vm6 kernel: [<ffffffffa490f58a>] ? __schedule+0x87a/0x1a70 May 9 17:36:00 vm6 kernel: [<ffffffffa323b010>] ? manage_workers.isra.19+0x960/0x960 May 9 17:36:00 vm6 kernel: [<ffffffffa324fb32>] kthread+0x262/0x300 May 9 17:36:00 vm6 kernel: [<ffffffffa324f8d0>] ? flush_kthread_worker+0x2d0/0x2d0 May 9 17:36:00 vm6 kernel: [<ffffffffa324f8d0>] ? flush_kthread_worker+0x2d0/0x2d0 May 9 17:36:00 vm6 kernel: [<ffffffffa4936577>] ret_from_fork_nospec_begin+0x21/0x21 May 9 17:36:00 vm6 kernel: [<ffffffffa324f8d0>] ? flush_kthread_worker+0x2d0/0x2d0 May 9 17:36:00 vm6 kernel: May 9 17:36:00 vm6 kernel: Allocated by task 259532492: May 9 17:36:00 vm6 kernel: [<ffffffffa3696ad3>] save_stack+0x43/0xe0 May 9 17:36:00 vm6 kernel: [<ffffffffa3696d7a>] kasan_kmalloc+0xaa/0xe0 May 9 17:36:00 vm6 kernel: [<ffffffffa369338e>] __kmalloc+0xee/0x270 May 9 17:36:00 vm6 kernel: [<ffffffffa3e4d4df>] kzalloc+0xf/0x20 May 9 17:36:00 vm6 kernel: [<ffffffffa55c5b63>] con_init+0x238/0x556 May 9 17:36:00 vm6 kernel: [<ffffffffa55c4c24>] console_init+0x48/0x4f May 9 17:36:00 vm6 kernel: [<ffffffffa5513495>] start_kernel+0x3ef/0x709 May 9 17:36:00 vm6 kernel: [<ffffffffa55125cf>] x86_64_start_reservations+0x24/0x26 May 9 17:36:00 vm6 kernel: [<ffffffffa551271f>] x86_64_start_kernel+0x14e/0x171 May 9 17:36:00 vm6 kernel: [<ffffffffa30000d5>] start_cpu+0x5/0x14 May 9 17:36:00 vm6 kernel: May 9 17:36:00 vm6 kernel: Freed by task 21499: May 9 17:36:00 vm6 kernel: [<ffffffffa3696ad3>] save_stack+0x43/0xe0 May 9 17:36:00 vm6 kernel: [<ffffffffa3696d7a>] kasan_kmalloc+0xaa/0xe0 May 9 17:36:00 vm6 kernel: [<ffffffffa3691f1c>] kmem_cache_alloc_trace+0xcc/0x220 May 9 17:36:00 vm6 kernel: [<ffffffffc1349726>] sesInfoAlloc+0x46/0x1e0 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc1302472>] cifs_get_smb_ses+0x3f2/0x16c0 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc1306425>] cifs_construct_tcon+0x765/0x1970 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc13079aa>] cifs_sb_tlink+0x31a/0x9a0 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc1339335>] cifs_get_inode_info+0x165/0x2390 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc1341d82>] cifs_revalidate_dentry_attr+0x412/0x670 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc13421b1>] cifs_getattr+0x131/0x520 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffa370dab9>] vfs_getattr+0x119/0x1c0 May 9 17:36:00 vm6 kernel: [<ffffffffa370dd31>] vfs_fstatat+0xb1/0x120 May 9 17:36:00 vm6 kernel: [<ffffffffa370f53c>] SYSC_newstat+0x7c/0xe0 May 9 17:36:00 vm6 kernel: [<ffffffffa370f85e>] SyS_newstat+0xe/0x10 May 9 17:36:00 vm6 kernel: [<ffffffffa493671e>] system_call_fastpath+0x25/0x2a May 9 17:36:00 vm6 kernel: May 9 17:36:00 vm6 kernel: The buggy address belongs to the object at ffff88005d658d80#012 which belongs to the cache kmalloc-512 of size 512 May 9 17:36:00 vm6 kernel: The buggy address is located 280 bytes inside of#012 512-byte region [ffff88005d658d80, ffff88005d658f80) May 9 17:36:00 vm6 kernel: The buggy address belongs to the page: May 9 17:36:00 vm6 kernel: page:ffffea0001759600 count:1 mapcount:1179667 mapping: (null) index:0x0 May 9 17:36:00 vm6 kernel: page flags: 0x1fffff00004080(slab|head) May 9 17:36:00 vm6 kernel: page dumped because: kasan: bad access detected May 9 17:36:00 vm6 kernel: May 9 17:36:00 vm6 kernel: Memory state around the buggy address: May 9 17:36:00 vm6 kernel: ffff88005d658d80: fc fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb May 9 17:36:00 vm6 kernel: ffff88005d658e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb May 9 17:36:00 vm6 kernel: >ffff88005d658e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb May 9 17:36:00 vm6 kernel: ^ May 9 17:36:00 vm6 kernel: ffff88005d658f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb May 9 17:36:00 vm6 kernel: ffff88005d658f80: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc May 9 17:36:00 vm6 kernel: ================================================================== May 9 17:36:00 vm6 kernel: Disabling lock debugging due to kernel taint May 9 17:36:00 vm6 kernel: ================================================================== May 9 17:36:00 vm6 kernel: BUG: KASAN: use-after-free in SMB2_sess_alloc_buffer+0x6d1/0x750 [cifs] May 9 17:36:00 vm6 kernel: Read of size 8 at addr ffff88005d658dd8 by task kworker/0:2/20118 May 9 17:36:00 vm6 kernel: May 9 17:36:00 vm6 kernel: CPU: 0 PID: 20118 Comm: kworker/0:2 Kdump: loaded Tainted: G B OE ------------ 3.10.0-1006.el7.kasan.x86_64 #1 May 9 17:36:00 vm6 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 May 9 17:36:00 vm6 kernel: Workqueue: cifsiod smb2_reconnect_server [cifs] May 9 17:36:00 vm6 kernel: Call Trace: May 9 17:36:00 vm6 kernel: [<ffffffffa4902043>] dump_stack+0x19/0x1b May 9 17:36:00 vm6 kernel: [<ffffffffa3697b8c>] print_address_description+0xfc/0x290 May 9 17:36:00 vm6 kernel: [<ffffffffc13a42d1>] ? SMB2_sess_alloc_buffer+0x6d1/0x750 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffa3697f62>] kasan_report.part.3+0x242/0x330 May 9 17:36:00 vm6 kernel: [<ffffffffa3698144>] __asan_report_load8_noabort+0x34/0x40 May 9 17:36:00 vm6 kernel: [<ffffffffc13a42d1>] SMB2_sess_alloc_buffer+0x6d1/0x750 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc13a3c00>] ? SMB2_negotiate+0x2130/0x2130 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffa4917b35>] ? _raw_spin_unlock_irqrestore+0x15/0x20 May 9 17:36:00 vm6 kernel: [<ffffffffa3697a59>] ? kasan_end_report+0x39/0x70 May 9 17:36:00 vm6 kernel: [<ffffffffa3697e3b>] ? kasan_report.part.3+0x11b/0x330 May 9 17:36:00 vm6 kernel: [<ffffffffc13a442d>] SMB2_sess_auth_rawntlmssp_negotiate+0xdd/0x9d0 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc139eb73>] SMB2_sess_setup+0x273/0x5a0 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc1301e87>] cifs_setup_session+0x2a7/0x4a0 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc1301b0c>] ? cifs_negotiate_protocol+0x1ac/0x280 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc13a092c>] smb2_reconnect+0x79c/0xbb0 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffa493a1e4>] ? ftrace_regs_call+0x5/0x81 May 9 17:36:00 vm6 kernel: [<ffffffffc13a0190>] ? SMB2_tcon+0x12f0/0x12f0 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffa491cdc9>] kretprobe_trampoline_holder+0x9/0x9 May 9 17:36:00 vm6 kernel: [<ffffffffc13b09d0>] ? SMB2_get_srv_num+0x50/0x50 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc13b09d0>] ? SMB2_get_srv_num+0x50/0x50 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffa491cdc9>] kretprobe_trampoline_holder+0x9/0x9 May 9 17:36:00 vm6 kernel: [<ffffffffa323b3d0>] worker_thread+0x3c0/0xb60 May 9 17:36:00 vm6 kernel: [<ffffffffa490f58a>] ? __schedule+0x87a/0x1a70 May 9 17:36:00 vm6 kernel: [<ffffffffa323b010>] ? manage_workers.isra.19+0x960/0x960 May 9 17:36:00 vm6 kernel: [<ffffffffa324fb32>] kthread+0x262/0x300 May 9 17:36:00 vm6 kernel: [<ffffffffa324f8d0>] ? flush_kthread_worker+0x2d0/0x2d0 May 9 17:36:00 vm6 kernel: [<ffffffffa324f8d0>] ? flush_kthread_worker+0x2d0/0x2d0 May 9 17:36:00 vm6 kernel: [<ffffffffa4936577>] ret_from_fork_nospec_begin+0x21/0x21 May 9 17:36:00 vm6 kernel: [<ffffffffa324f8d0>] ? flush_kthread_worker+0x2d0/0x2d0 May 9 17:36:00 vm6 kernel: May 9 17:36:00 vm6 kernel: Allocated by task 259532492: May 9 17:36:00 vm6 kernel: [<ffffffffa3696ad3>] save_stack+0x43/0xe0 May 9 17:36:00 vm6 kernel: [<ffffffffa3696d7a>] kasan_kmalloc+0xaa/0xe0 May 9 17:36:00 vm6 kernel: [<ffffffffa369338e>] __kmalloc+0xee/0x270 May 9 17:36:00 vm6 kernel: [<ffffffffa3e4d4df>] kzalloc+0xf/0x20 May 9 17:36:00 vm6 kernel: [<ffffffffa55c5b63>] con_init+0x238/0x556 May 9 17:36:00 vm6 kernel: [<ffffffffa55c4c24>] console_init+0x48/0x4f May 9 17:36:00 vm6 kernel: [<ffffffffa5513495>] start_kernel+0x3ef/0x709 May 9 17:36:00 vm6 kernel: [<ffffffffa55125cf>] x86_64_start_reservations+0x24/0x26 May 9 17:36:00 vm6 kernel: [<ffffffffa551271f>] x86_64_start_kernel+0x14e/0x171 May 9 17:36:00 vm6 kernel: [<ffffffffa30000d5>] start_cpu+0x5/0x14 May 9 17:36:00 vm6 kernel: May 9 17:36:00 vm6 kernel: Freed by task 21499: May 9 17:36:00 vm6 kernel: [<ffffffffa3696ad3>] save_stack+0x43/0xe0 May 9 17:36:00 vm6 kernel: [<ffffffffa3696d7a>] kasan_kmalloc+0xaa/0xe0 May 9 17:36:00 vm6 kernel: [<ffffffffa3691f1c>] kmem_cache_alloc_trace+0xcc/0x220 May 9 17:36:00 vm6 kernel: [<ffffffffc1349726>] sesInfoAlloc+0x46/0x1e0 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc1302472>] cifs_get_smb_ses+0x3f2/0x16c0 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc1306425>] cifs_construct_tcon+0x765/0x1970 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc13079aa>] cifs_sb_tlink+0x31a/0x9a0 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc1339335>] cifs_get_inode_info+0x165/0x2390 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc1341d82>] cifs_revalidate_dentry_attr+0x412/0x670 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffc13421b1>] cifs_getattr+0x131/0x520 [cifs] May 9 17:36:00 vm6 kernel: [<ffffffffa370dab9>] vfs_getattr+0x119/0x1c0 ... it gets worse from there. Created attachment 1567702 [details]
stap to remove/modify ratelimiting for cifs module
This stap will remove ratelimiting for all locations in the cifs module, so cifs debugging will not be lost by the high rate of messages.
Modifies the ratelimit intervals in a currently-loaded cifs module, and immediately exits.
# stap -g mod_cifs_ratelimit.stp
765 ratelimit_states modified in module 'cifs' to interval: 0, burst: 1
Obviously this removes potentially important safety provided by the ratelimiter...
Created attachment 1569583 [details]
script to modify TLINK_IDLE_EXPIRE in the cifs.ko module file
This script will modify the TLINK_IDLE_EXPIRE from the default of 600 seconds to 6 (change the script, if needed) by patching the cifs kernel module binary itself. The modified kernel module will be the file cifs-modified.ko in the current directory.
To take effect, cifs mounts will need to be unmounted and the cifs module will need to be unloaded (if mounts exist/the module is loaded), and the new, modified kernel module loaded (insmod cifs-modified.ko)
The original module can also be replaced by copying the modified file on top of the original under /lib/modules/$(uname -r)/kernel/fs/cifs (see the output of 'modinfo -n cifs'). If the original is .xz compressed, don't forget to compress the modified file first.
Created attachment 1570161 [details]
script to modify TLINK_IDLE_EXPIRE in the cifs.ko module file - v2
updated script:
if old TLINK_IDLE_EXPIRE not found, do nothing
objcopy the modified module to remove now-invalid signing
I have reproduced this bug using the 5.1.0 kernel, so this bug exists upstream as well. May 22 15:54:13 vm1 kernel: kworker/0:0(3292) cifs_prune_tlinks() May 22 15:54:13 vm1 kernel: kworker/0:0(3292) cifs_prune_tlinks()return... May 22 15:54:19 vm1 kernel: kworker/0:0(3292) cifs_prune_tlinks() May 22 15:54:19 vm1 kernel: kworker/0:0(3292) cifs_put_tcon() - tcon: 0xffff8880a1fbaa80 IPC_tcon: 0xffff8880a1fb8880 May 22 15:54:19 vm1 kernel: kworker/0:0(3292) cifs_send_recv() (cifs_ses: 0xffff88803b297c00) May 22 15:54:19 vm1 kernel: cifsd(3161) smb2_is_session_expired() - expiring session; shdr->Status was: 3221226332 May 22 15:54:19 vm1 kernel: kworker/0:0(3292) cifs_send_recv() (cifs_ses: 0xffff88803b297c00) returns -11 May 22 15:54:19 vm1 kernel: kworker/0:0(3292) tconInfoFree() - 0xffff8880a1fbaa80 May 22 15:54:19 vm1 kernel: kworker/0:0(3292) tconInfoFree()Delay in tconInfoFree... May 22 15:54:19 vm1 kernel: kworker/1:1(28614) smb2_reconnect_server() (TCP_Server_Info: 0xffff8880638f4c80) May 22 15:54:19 vm1 kernel: kworker/1:1(28614) smb2_reconnect() (tcon: 0xdffffc0000000000) - unable to dereference tcon->ses May 22 15:54:19 vm1 kernel: kworker/1:1(28614) smb2_reconnect() (tcon: 0xffff8880a1fb8880) tcon->ses: 0xffff88803b297c00, tcon_ipc: 0xffff8880a1fb8880 May 22 15:54:19 vm1 kernel: kworker/1:1(28614) smb2_reconnect()recon tcon set to 1! May 22 15:54:19 vm1 kernel: kworker/1:1(28614) cifs_send_recv() (cifs_ses: 0xffff88803b297c00) May 22 15:54:19 vm1 kernel: kworker/1:1(28614) cifs_send_recv() (cifs_ses: 0xffff88803b297c00) returns 0 May 22 15:54:19 vm1 kernel: kworker/1:1(28614) SMB2_sess_setup() (ses: 0xffff88803b297c00) - reconnect: 1 May 22 15:54:19 vm1 kernel: kworker/1:1(28614) smb2_select_sectype() (struct TCP_Server_Info)0xffff8880638f4c80 May 22 15:54:19 vm1 kernel: kworker/1:1(28614) smb2_select_sectype()delay in smb2_select_sectype... May 22 15:54:19 vm1 kernel: kworker/0:0(3292) tconInfoFree() - 0xffff8880a1fb8880 May 22 15:54:19 vm1 kernel: kworker/0:0(3292) sesInfoFree() - (struct cifs_ses *)0xffff88803b297c00 May 22 15:54:19 vm1 kernel: kworker/0:0(3292) cifs_prune_tlinks()return... May 22 15:54:19 vm1 kernel: ================================================================== May 22 15:54:19 vm1 kernel: BUG: KASAN: use-after-free in SMB2_sess_setup+0x136/0x340 [cifs] May 22 15:54:19 vm1 kernel: Read of size 8 at addr ffff88803b297c78 by task kworker/1:1/28614 May 22 15:54:19 vm1 kernel: May 22 15:54:19 vm1 kernel: CPU: 1 PID: 28614 Comm: kworker/1:1 Tainted: GF W OEL 5.1.0-fs1+ #2 May 22 15:54:19 vm1 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014 May 22 15:54:19 vm1 kernel: Workqueue: cifsiod smb2_reconnect_server [cifs] May 22 15:54:19 vm1 kernel: Call Trace: May 22 15:54:19 vm1 kernel: dump_stack+0x71/0xab May 22 15:54:19 vm1 kernel: ? SMB2_sess_setup+0x136/0x340 [cifs] May 22 15:54:19 vm1 kernel: print_address_description+0x6a/0x270 May 22 15:54:19 vm1 kernel: ? SMB2_sess_setup+0x136/0x340 [cifs] May 22 15:54:19 vm1 kernel: ? SMB2_sess_setup+0x136/0x340 [cifs] May 22 15:54:19 vm1 kernel: __kasan_report+0x14e/0x192 May 22 15:54:19 vm1 kernel: ? SMB2_sess_setup+0x136/0x340 [cifs] May 22 15:54:19 vm1 kernel: kasan_report+0xe/0x20 May 22 15:54:19 vm1 kernel: SMB2_sess_setup+0x136/0x340 [cifs] May 22 15:54:19 vm1 kernel: ? SMB2_sess_alloc_buffer+0x210/0x210 [cifs] May 22 15:54:19 vm1 kernel: copy_oldmem_page_encrypted+0x20/0x20 May 22 15:54:19 vm1 kernel: ? cifs_negotiate_protocol+0xc7/0x110 [cifs] May 22 15:54:19 vm1 kernel: ? smb2_reconnect.part.19+0x6a5/0xb90 [cifs] May 22 15:54:19 vm1 kernel: ? _raw_spin_lock_irqsave+0x8d/0xf0 May 22 15:54:19 vm1 kernel: ? enter_kprobe_probe+0x1c0/0x6f0 [stap_22c8762413e02871898f46ec829700ba__6949] May 22 15:54:19 vm1 kernel: ? SMB2_tcon+0xa70/0xa70 [cifs] May 22 15:54:19 vm1 kernel: ? pre_handler_kretprobe+0x1ee/0x290 May 22 15:54:19 vm1 kernel: ? smb2_reconnect.part.19+0x5/0xb90 [cifs] May 22 15:54:19 vm1 kernel: ? aggr_pre_handler+0x74/0xa0 May 22 15:54:19 vm1 kernel: ? smb2_reconnect.part.19+0x1/0xb90 [cifs] May 22 15:54:19 vm1 kernel: ? kprobe_ftrace_handler+0xf5/0x130 May 22 15:54:19 vm1 kernel: ? SMB2_tcon+0xa70/0xa70 [cifs] May 22 15:54:19 vm1 kernel: ? smb2_reconnect_server+0x303/0x4c0 [cifs] May 22 15:54:19 vm1 kernel: ? ftrace_ops_assist_func+0xd3/0x150 May 22 15:54:19 vm1 kernel: ? 0xffffffffc0ff80bf May 22 15:54:19 vm1 kernel: ? smb2_reconnect_server+0x22e/0x4c0 [cifs] May 22 15:54:19 vm1 kernel: ? copy_oldmem_page_encrypted+0x20/0x20 May 22 15:54:19 vm1 kernel: ? 0xffffffffc0ff80bf May 22 15:54:19 vm1 kernel: ? SMB2_get_srv_num+0x40/0x40 [cifs] May 22 15:54:19 vm1 kernel: ? process_one_work+0x379/0x6d0 May 22 15:54:19 vm1 kernel: ? copy_oldmem_page_encrypted+0x20/0x20 May 22 15:54:19 vm1 kernel: ? worker_thread+0x57/0x5a0 May 22 15:54:19 vm1 kernel: ? process_one_work+0x6d0/0x6d0 May 22 15:54:19 vm1 kernel: ? kthread+0x1ae/0x1d0 May 22 15:54:19 vm1 kernel: ? kthread_park+0xb0/0xb0 May 22 15:54:19 vm1 kernel: ? ret_from_fork+0x35/0x40 May 22 15:54:19 vm1 kernel: May 22 15:54:19 vm1 kernel: Allocated by task 3169: May 22 15:54:19 vm1 kernel: save_stack+0x19/0x80 May 22 15:54:19 vm1 kernel: __kasan_kmalloc.constprop.7+0xa0/0xd0 May 22 15:54:19 vm1 kernel: sesInfoAlloc+0x29/0xe0 [cifs] May 22 15:54:19 vm1 kernel: cifs_get_smb_ses+0x284/0x1090 [cifs] May 22 15:54:19 vm1 kernel: cifs_construct_tcon+0x3b0/0xbf0 [cifs] May 22 15:54:19 vm1 kernel: cifs_sb_tlink+0x169/0x350 [cifs] May 22 15:54:19 vm1 kernel: cifs_get_inode_info+0xdd/0xf00 [cifs] May 22 15:54:19 vm1 kernel: cifs_revalidate_dentry_attr+0x173/0x4e0 [cifs] May 22 15:54:19 vm1 kernel: cifs_getattr+0xb7/0x2e0 [cifs] May 22 15:54:19 vm1 kernel: vfs_statx+0xcf/0x140 May 22 15:54:19 vm1 kernel: __do_sys_newstat+0x77/0xd0 May 22 15:54:19 vm1 kernel: do_syscall_64+0x73/0x210 May 22 15:54:19 vm1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 May 22 15:54:19 vm1 kernel: May 22 15:54:19 vm1 kernel: Freed by task 3292: May 22 15:54:19 vm1 kernel: save_stack+0x19/0x80 May 22 15:54:19 vm1 kernel: __kasan_slab_free+0x130/0x180 May 22 15:54:19 vm1 kernel: kfree+0x90/0x1a0 May 22 15:54:19 vm1 kernel: cifs_put_smb_ses+0x259/0x720 [cifs] May 22 15:54:19 vm1 kernel: cifs_put_tlink+0xac/0xb0 [cifs] May 22 15:54:19 vm1 kernel: cifs_prune_tlinks+0xe9/0x140 [cifs] May 22 15:54:19 vm1 kernel: kretprobe_trampoline+0x0/0x50 May 22 15:54:19 vm1 kernel: May 22 15:54:19 vm1 kernel: The buggy address belongs to the object at ffff88803b297c00#012 which belongs to the cache kmalloc-512 of size 512 May 22 15:54:19 vm1 kernel: The buggy address is located 120 bytes inside of#012 512-byte region [ffff88803b297c00, ffff88803b297e00) May 22 15:54:19 vm1 kernel: The buggy address belongs to the page: May 22 15:54:19 vm1 kernel: page:ffffea0000eca500 count:1 mapcount:0 mapping:ffff88800fc02e00 index:0x0 compound_mapcount: 0 May 22 15:54:19 vm1 kernel: flags: 0xfffffc0010200(slab|head) May 22 15:54:19 vm1 kernel: raw: 000fffffc0010200 ffffea0002a0db00 0000000400000004 ffff88800fc02e00 May 22 15:54:19 vm1 kernel: raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000 May 22 15:54:19 vm1 kernel: page dumped because: kasan: bad access detected May 22 15:54:19 vm1 kernel: May 22 15:54:19 vm1 kernel: Memory state around the buggy address: May 22 15:54:19 vm1 kernel: ffff88803b297b00: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc May 22 15:54:19 vm1 kernel: ffff88803b297b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc May 22 15:54:19 vm1 kernel: >ffff88803b297c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb May 22 15:54:19 vm1 kernel: ^ May 22 15:54:19 vm1 kernel: ffff88803b297c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb May 22 15:54:19 vm1 kernel: ffff88803b297d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb May 22 15:54:19 vm1 kernel: ================================================================== May 22 15:54:19 vm1 kernel: BUG: kernel NULL pointer dereference, address: 000000000000052c May 22 15:54:19 vm1 kernel: #PF: supervisor read access in kernel mode May 22 15:54:19 vm1 kernel: #PF: error_code(0x0000) - not-present page Created attachment 1572649 [details]
script to modify TLINK_IDLE_EXPIRE in the cifs.ko module file - v3
Gonzalo, nice. Your patch in #24 looks like it might be a valid workaround for this issue. It is small and should have few side-effects. Can you verify that you can not reproduce the crashes anymore with this patch? If so then we should turn this into a RHEL-only patch for 7.7. Created attachment 1572780 [details]
updated stap to reproduce the panic
updated to run on RHEL 7 kernels and upstream, as needed
only the two changes in smb2_reconnect_server are required for upstream: From: Gonzalo <gsierohu> Date: Thu, 9 May 2019 15:40:16 +0200 Subject: [PATCH] CIFS Rhel 7.6 bug 1702264 --- diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 651e697..969f495 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -3097,6 +3097,7 @@ void smb2_reconnect_server(struct work_struct *work) if (ses->tcon_ipc && ses->tcon_ipc->need_reconnect) { list_add_tail(&ses->tcon_ipc->rlist, &tmp_list); tcon_exist = true; + ses->ses_count++; } } /* @@ -3115,6 +3116,8 @@ void smb2_reconnect_server(struct work_struct *work) else resched = true; list_del_init(&tcon->rlist); + if (tcon->ipc) + cifs_put_smb_ses(tcon->ses); cifs_put_tcon(tcon); } I applied the patch to upstream kernel, kicked off the test, and will let it run overnight. the test of 5.1 kernel ran overnight through approximately 2800 iterations (6 second TLINK_IDLE_EXPIRE, 13-second sleeps between user2 doing the 'cd /path ; ls ; cd /', as described in the reproducer) without a crash. Before the patch, the crash was occurring consistently within the first several iterations. KASAN did barf once, the first time that it entered smb2_reconnect_server() after mount, but the bad read actually came from within the systemtap used in reproducing the issue, and is probably from some of the debugging output added. Nice job, Gonzalo - I think the patch does the trick. Posted RHEL-only patch as http://post-office.corp.redhat.com/archives/rhkernel-list/2019-May/msg11070.html Frank, can you post #30 for upstream? Gonzalo can you post your patch to upstream (AFAIK you need to send to sfrench and cc linux-cifs.org)? Patch(es) committed on kernel-3.10.0-1063.el7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1016 |
Description of problem: We have seen several panics in latest RHEL kernel (3.10.0-957.5.1.el7.x86_64) in smb2_reconnect_server() processing due to to access to a freed ses struct. To study the problem we have enabled slub_debug flags, to have some additional data about free. Retrace tasks 393123977 & 670498550 are the vmcore files with slub_debug FZPU flags enabled. Looking at 393123977: PID: 125271 TASK: ffff9deb9c6a2330 CPU: 21 COMMAND: "kworker/21:0" #0 [ffff9df282367838] machine_kexec at ffffffff83663674 #1 [ffff9df282367898] __crash_kexec at ffffffff8371cf32 #2 [ffff9df282367968] crash_kexec at ffffffff8371d020 #3 [ffff9df282367980] oops_end at ffffffff83d6c758 #4 [ffff9df2823679a8] die at ffffffff8362f95b #5 [ffff9df2823679d8] do_general_protection at ffffffff83d6c052 #6 [ffff9df282367a10] general_protection at ffffffff83d6b6f8 [exception RIP: cifs_send_recv+85] RIP: ffffffffc1a8d335 RSP: ffff9df282367ac8 RFLAGS: 00010202 RAX: 0000000000000200 RBX: 0000000000000210 RCX: ffff9def4e26cc08 RDX: ffff9df282367b48 RSI: ffff9dd911f01a88 RDI: 6b6b6b6b6b6b6b6b RBP: ffff9df282367b08 R8: 0000000000000210 R9: ffff9df282367c48 R10: 0000000000000000 R11: ffff9def4e26cc10 R12: ffff9dd911f01a88 R13: 0000000000000002 R14: 0000000000000000 R15: ffff9def4e26cc08 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff9df282367b10] smb2_send_recv at ffffffffc1a8d9b1 [cifs] #8 [ffff9df282367c38] SMB2_sess_sendreceive at ffffffffc1aa0e62 [cifs] #9 [ffff9df282367c80] SMB2_auth_kerberos at ffffffffc1aa2f43 [cifs] #10 [ffff9df282367cb8] SMB2_sess_setup at ffffffffc1aa15f0 [cifs] #11 [ffff9df282367cf0] cifs_setup_session at ffffffffc1a75026 [cifs] #12 [ffff9df282367d20] smb2_reconnect at ffffffffc1aa1f8c [cifs] #13 [ffff9df282367db0] smb2_reconnect_server at ffffffffc1aa5093 [cifs] #14 [ffff9df282367e20] process_one_work at ffffffff836b9d8f #15 [ffff9df282367e68] worker_thread at ffffffff836bae26 #16 [ffff9df282367ec8] kthread at ffffffff836c1c71 #17 [ffff9df282367f50] ret_from_fork_nospec_begin at ffffffff83d74c24 int cifs_send_recv(const unsigned int xid, struct cifs_ses *ses, struct smb_rqst *rqst, int *resp_buf_type, const int flags, struct kvec *resp_iov) { ... if (ses->server->tcpStatus == CifsExiting) <--- panic return -ENOENT; ses -> ffff9dd911f01a80 Looking into the buffer, we can see the slub_debug pattern of a freed buffer: crash> rd -S ffff9dd911f01a80 0x69 ffff9dd911f01a80: bbbbbbbbbbbbbbbb 6b6b6b6b6b6b6b6b ffff9dd911f01a90: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b ffff9dd911f01aa0: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b ffff9dd911f01ab0: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b [...] ffff9dd911f01c70: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b ffff9dd911f01c80: a56b6b6b6b6b6b6b bbbbbbbbbbbbbbbb ^^^^^^^^^^^^^^^^ Free ffff9dd911f01c90: [kmalloc-512] sesInfoAlloc+32 \ ffff9dd911f01ca0: __slab_alloc+64 kmem_cache_alloc_trace+423 | ffff9dd911f01cb0: sesInfoAlloc+32 cifs_get_smb_ses+186 | ffff9dd911f01cc0: cifs_construct_tcon+806 cifs_sb_tlink+483 | ffff9dd911f01cd0: cifs_get_inode_info+107 cifs_revalidate_dentry_attr+487 |> Last Alloc ffff9dd911f01ce0: cifs_getattr+87 vfs_getattr+73 | ffff9dd911f01cf0: vfs_fstatat+117 SYSC_newlstat+49 | ffff9dd911f01d00: sys_newlstat+14 system_call_fastpath+34 / ffff9dd911f01d10: 0000000000000000 0000000000000000 ffff9dd911f01d20: 00007ba70000000f 00000001112b9dd4 ---> cpu,pid,when (Alloc) ffff9dd911f01d30: sesInfoFree+147 kfree+262 \ ffff9dd911f01d40: sesInfoFree+147 cifs_put_smb_ses+362 | ffff9dd911f01d50: cifs_put_tcon+211 cifs_put_tlink+117 |> Last Free ffff9dd911f01d60: cifs_prune_tlinks+167 process_one_work+383 | ffff9dd911f01d70: worker_thread+294 kthread+209 | ffff9dd911f01d80: ret_from_fork_nospec_begin+14 0000000000000000 / ffff9dd911f01d90: 0000000000000000 0000000000000000 ffff9dd911f01da0: 0000000000000000 0000000000000000 ffff9dd911f01db0: 0000000000000000 000061210000003d ---> cpu, pid (Free) ffff9dd911f01dc0: 000000011144300d ---> when (Free) crash> ps -m 0x6121 [0 00:00:00.030] [IN] PID: 24865 TASK: ffff9e9a34f06970 CPU: 61 COMMAND: "kworker/61:0" So the worker thread that freed cifs_ses ran just 30 jiffies before... They are using multiuser mount option... The tcon we're freeing is not the master (see cifs_prune_tlinks) Alloc path: [kmalloc-512] <-- sesInfoAlloc+32 --> __slab_alloc+64 --> kmem_cache_alloc_trace+423 --> sesInfoAlloc+32 --> cifs_get_smb_ses+186 --> cifs_construct_tcon+806 <--- This happens because this is multiuser/we have no tlink in rb for that fsuid --> cifs_sb_tlink+483 --> cifs_get_inode_info+107 --> cifs_revalidate_dentry_attr+487 --> cifs_getattr+87 --> vfs_getattr+73 .... Free path: sesInfoFree+147 --> kfree+262 --> sesInfoFree+147 --> cifs_put_smb_ses+362 <<-- here we del ses from the list (TCP_Server_Info *server->smb_ses_list) --> cifs_put_tcon+211 --> cifs_put_tlink+117 --> cifs_prune_tlinks+167 <--- periodic workqueue job that scans tcon_tree for a superblock and closes out tcons: we have multiuser mount option!!! --> process_one_work+383 --> worker_thread+294 ... Use path: (*) cifs_send_recv+85 <-- here ses was already freed -->smb2_send_recv -->SMB2_sess_sendreceive (**) -->SMB2_auth_kerberos <-- use ses->server in SMB2_sess_alloc_buffer() called from here. It should be still allocated/valid since we didn't crash -->SMB2_sess_setup -->cifs_setup_session -->smb2_reconnect (***) -->smb2_reconnect_server -->process_one_work at ... Q: when we reconnect, is the tcon ref 0? possibility based on what i've seen: Thread A Thread B . process_one_work() . . . cifs_prune_tlinks() while (node != NULL) { tmp = node; node = rb_next(tmp); tlink = rb_entry(tmp, struct tcon_link, tl_rbnode); if (test_bit(TCON_LINK_MASTER, &tlink->tl_flags) || atomic_read(&tlink->tl_count) != 0 || time_after(tlink->tl_time + TLINK_IDLE_EXPIRE, jiffies)) continue; Our tlink has tl_count == 0 --> cifs_get_tlink(tlink); clear_bit(TCON_LINK_IN_TREE, &tlink->tl_flags); rb_erase(tmp, root); spin_unlock(&cifs_sb->tlink_tree_lock); Back to tl_count == 0, then free --> cifs_put_tlink(tlink); . . . cifs_put_tlink() . . . . . . process_one_work() . . . . . smb2_reconnect_server() spin_lock(&cifs_tcp_ses_lock); list_for_each_entry(ses, &server->smb_ses_list, smb_ses_list) { list_for_each_entry(tcon, &ses->tcon_list, tcon_list) { if (tcon->need_reconnect || tcon->need_reopen_files) { tcon->tc_count++; <<------------------------- Increase tc_count, now 1? list_add_tail(&tcon->rlist, &tmp_list); tcon_exist = true; [...] spin_unlock(&cifs_tcp_ses_lock); . . cifs_put_tcon() (*) Note: not exactly here, but *after* smb2_reconnect_server() spin_lock(&cifs_tcp_ses_lock); if (--tcon->tc_count > 0) { <<---- now 0 again? [...] list_del_init(&tcon->tcon_list); <<-- this happens when tc_count == 0 . . smb2_reconnect() . . cifs_put_smb_ses() (*) Note: not exactly here, but *after* smb2_reconnect_server() . . cifs_setup_session() . . . . . SMB2_sess_setup() . . . . . SMB2_auth_kerberos() . . sesInfoFree() Version-Release number of selected component (if applicable): RHEL kernel (3.10.0-957.5.1.el7.x86_64) How reproducible: Not yet reproduced. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: