Bug 1824270

Summary: CVE-2020-10742 kernel: NFS client crash due to index buffer overflow during Direct IO write causing kernel panic [rhel-7]
Product: Red Hat Enterprise Linux 7 Reporter: Frank Sorenson <fsorenso>
Component: kernelAssignee: Benjamin Coddington <bcodding>
kernel sub component: NFS QA Contact: Zhi Li <yieli>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: allarkin, bcodding, bhu, eshatokhin, jaeshin, jforbes, jshivers, nfs-maint, pvlasin, snishika, swhiteho, xzhou, yieli, yoyang
Version: 7.7Keywords: Patch, Reproducer, Security, SecurityTracking
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-3.10.0-1140.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1826332 (view as bug list) Environment:
Last Closed: 2020-09-29 21:12:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1826332, 1835127, 1839680    
Deadline: 2021-05-13   
Attachments:
Description Flags
dmesg from 3.10.0-1062.1.1.el7.x86_64 kernel with slub_debug=FZPU none

Description Frank Sorenson 2020-04-15 16:53:20 UTC
Description of problem:

NFS client system will crash if the nfs4 session fore channel max_rqst_size returned by the server includes an overhead that is smaller than the client overhead.


Version-Release number of selected component (if applicable):

	kernel 3.10.0-1062.1.1.el7.x86_64
	kernel-3.10.0-957.46.1.el7.x86_64
	kernel 3.10.0-862.9.1.el7.x86_64


How reproducible:

	easy, see steps below


Steps to Reproduce:

	nfs client and server can be the same system

	# systemctl stop nfs-server.service
	# echo 524288 > /proc/fs/nfsd/max_block_size
	# systemctl start nfs-server.service

	cat <<EOFEOFEOF >/var/tmp/limit_rwsize.stp
	probe module("nfsd").function("check_forechannel_attrs") {
		printf("%s() - maxreq_sz: %d\n", ppfunc(), $ca->maxreq_sz)
		$ca->maxreq_sz = 525312
		printf("    adjusted maxreq_sz to %d\n", $ca->maxreq_sz)
	}
	EOFEOFEOF

	# stap -tvg /var/tmp/limit_rwsize.stp --skip-badvars --suppress-handler-errors 2>&1

	# mount localhost:/exports /mnt/tmp -overs=4.1,sec=sys,wsize=524288

	verify that wsize has been reduced from the expected 524288:
	# egrep -o 'rsize=[0-9]+,wsize=[0-9]+' /proc/self/mountinfo
	rsize=524288,wsize=524268

	# dd if=/boot/initramfs-$(uname -r).img of=/mnt/tmp/foo bs=10M iflag=direct oflag=direct


Actual results:

	nfs client system panics


Expected results:

	no kernel panic


Additional info:

nfs client crashes with:
[ 1158.976210] general protection fault: 0000 [#1] SMP 
[ 1158.976258] Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache stap_4e813b501609e0a3e00c1bef70a691f1_2064(OE) bonding snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq iosf_mbi ppdev crc32_pclmul ghash_clmulni_intel aesni_intel snd_seq_device snd_pcm lrw gf128mul snd_timer sg snd glue_helper ablk_helper pcspkr soundcore cryptd virtio_balloon parport_pc joydev parport i2c_piix4 nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables xfs libcrc32c sr_mod cdrom ata_generic pata_acpi virtio_console virtio_blk qxl drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ata_piix ttm crct10dif_pclmul crct10dif_common libata drm e1000 crc32c_intel serio_raw floppy virtio_pci i2c_core virtio_ring virtio
[ 1158.976881] CPU: 1 PID: 291 Comm: kworker/1:2 Kdump: loaded Tainted: G           OE  ------------   3.10.0-862.9.1.el7.x86_64 #1
[ 1158.976952] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
[ 1158.977042] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ 1158.977080] task: ffff9c21b6319fa0 ti: ffff9c21b6a90000 task.ti: ffff9c21b6a90000
[ 1158.977126] RIP: 0010:[<ffffffff9aff8123>]  [<ffffffff9aff8123>] kmem_cache_alloc_node+0xd3/0x200
[ 1158.977187] RSP: 0018:ffff9c21b6a93a50  EFLAGS: 00010246
[ 1158.977221] RAX: 0000000000000000 RBX: 0000000000000020 RCX: 000000000000fd84
[ 1158.977265] RDX: 000000000000fd83 RSI: 0000000000000020 RDI: 000000000001bb20
[ 1158.977309] RBP: ffff9c21b6a93a90 R08: ffff9c21fdd1bb20 R09: ffff9c21fd801600
[ 1158.977354] R10: ffffffff9b3d7ded R11: 0000000000000000 R12: 001fffff0008007c
[ 1158.977419] R13: 0000000000000020 R14: 00000000ffffffff R15: ffff9c21fd801600
[ 1158.977465] FS:  0000000000000000(0000) GS:ffff9c21fdd00000(0000) knlGS:0000000000000000
[ 1158.977525] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1158.977573] CR2: 00007f6543b6c000 CR3: 0000000028c6a000 CR4: 00000000000406e0
[ 1158.977622] Call Trace:
[ 1158.977648]  [<ffffffff9b3d7ded>] __alloc_skb+0x5d/0x2d0
[ 1158.977686]  [<ffffffff9b443382>] sk_stream_alloc_skb+0x52/0x1b0
[ 1158.977726]  [<ffffffff9b44f953>] tcp_fragment+0x53/0x2c0
[ 1158.977763]  [<ffffffff9b450fcf>] tcp_write_xmit+0x28f/0xd00
[ 1158.977800]  [<ffffffff9b451d80>] tcp_push_one+0x30/0x40
[ 1158.977835]  [<ffffffff9b4437de>] tcp_sendpage+0x2fe/0x5c0
[ 1158.977872]  [<ffffffff9b46f690>] ? inet_sendmsg+0xb0/0xb0
[ 1158.977908]  [<ffffffff9b46f700>] inet_sendpage+0x70/0xe0
[ 1158.977956]  [<ffffffffc069f235>] xs_sendpages+0x135/0x200 [sunrpc]
[ 1158.978042]  [<ffffffffc06a0b31>] xs_tcp_send_request+0x91/0x220 [sunrpc]
[ 1158.978094]  [<ffffffffc069d17b>] xprt_transmit+0x6b/0x330 [sunrpc]
[ 1158.978140]  [<ffffffffc0698f50>] call_transmit+0x1d0/0x2c0 [sunrpc]
[ 1158.978185]  [<ffffffffc0698d80>] ? call_decode+0x880/0x880 [sunrpc]
[ 1158.978230]  [<ffffffffc0698d80>] ? call_decode+0x880/0x880 [sunrpc]
[ 1158.978277]  [<ffffffffc06a6369>] __rpc_execute+0x99/0x420 [sunrpc]
[ 1158.978330]  [<ffffffff9b5139fc>] ? __schedule+0x41c/0xa20
[ 1158.978386]  [<ffffffffc06a6702>] rpc_async_schedule+0x12/0x20 [sunrpc]
[ 1158.978455]  [<ffffffff9aeb35ef>] process_one_work+0x17f/0x440
[ 1158.978492]  [<ffffffff9aeb4686>] worker_thread+0x126/0x3c0
[ 1158.978541]  [<ffffffff9aeb4560>] ? manage_workers.isra.24+0x2a0/0x2a0
[ 1158.978594]  [<ffffffff9aebb621>] kthread+0xd1/0xe0
[ 1158.980125]  [<ffffffff9aebb550>] ? insert_kthread_work+0x40/0x40
[ 1158.981690]  [<ffffffff9b5205f7>] ret_from_fork_nospec_begin+0x21/0x21
[ 1158.983308]  [<ffffffff9aebb550>] ? insert_kthread_work+0x40/0x40
[ 1158.984753] Code: 8b 5d 08 66 66 66 66 90 48 83 c4 18 4c 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00 49 63 41 20 48 8d 4a 01 49 8b 39 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 5a ff 
[ 1158.987804] RIP  [<ffffffff9aff8123>] kmem_cache_alloc_node+0xd3/0x200
[ 1158.989238]  RSP <ffff9c21b6a93a50>


slub_debug indicates a bug in kmalloc-1024 ;  I'll upload a log and vmcore

Comment 2 Frank Sorenson 2020-04-15 18:35:34 UTC
Created attachment 1679154 [details]
dmesg from 3.10.0-1062.1.1.el7.x86_64 kernel with slub_debug=FZPU

Comment 19 Benjamin Coddington 2020-04-27 18:30:12 UTC
Hi Jay, I was responding to the patches on rhkernel-list but I have a little more to discuss so I am bringing the discussion here.

It seems infeasible to convert to iov_iter, so let's abandon that.

One issue with using krealloc() though is that you'll be incurring an unnecessary memcpy() penalty.  It's nice to reuse the pages, but there's no need to copy over the pointer values from the last pass, right?

It seems we ought to be able correctly calculate the the maximum pages needed before we enter the loop.. perhaps it makes sense to just assume that we're going to potentially cross a page boundary on the next pass through the loop, so let's just allocate the extra page ptr in the first place.  We're not allocating a page, we're just allocating a pointer.

What do you think?

Comment 21 Benjamin Coddington 2020-04-28 11:17:54 UTC
Well, the other difference is that in the RHEL 7 loop, there are a number of checks that can cause the loop to abort.  If you abort the loop after allocating, you have to handle code paths to free those pointers each time.  However, we can simply handle this case by always allocating an extra pointer:

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index f639618b7a2d..d3cc27cd28df 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -873,7 +873,7 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_pageio_descriptor *d
                result = -ENOMEM;
                npages = nfs_page_array_len(pgbase, bytes);
                if (!pagevec)
-                       pagevec = kmalloc(npages * sizeof(struct page *), GFP_KERNEL);
+                       pagevec = kmalloc((npages + 1) * sizeof(struct page *), GFP_KERNEL);
                if (!pagevec)
                        break;
 


What do you think?

Comment 33 Steve Whitehouse 2020-04-30 11:19:56 UTC
Ben, what is the current status here? Do we need to defer? Please update the bug flags.

Comment 35 Jan Stancek 2020-05-08 12:12:49 UTC
Patch(es) committed on kernel-3.10.0-1140.el7

Comment 47 Zhi Li 2020-05-15 08:48:40 UTC
Moving to VERIFIED according to comment#44 and comment#46.

Comment 48 Petr Matousek 2020-05-28 10:29:25 UTC
*** Bug 1839679 has been marked as a duplicate of this bug. ***

Comment 53 errata-xmlrpc 2020-09-29 21:12:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4060