Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1036972

Summary:	use after free in new nfsd DRC code
Product:	Red Hat Enterprise Linux 6	Reporter:	Jeff Layton <jlayton>
Component:	kernel	Assignee:	J. Bruce Fields <bfields>
Status:	CLOSED ERRATA	QA Contact:	Chao Ye <cye>
Severity:	high	Docs Contact:
Priority:	urgent
Version:	6.5	CC:	amihai, cye, dhoward, eguan, harshula, msvoboda, myamazak, nfs-maint, pasteur, sakaguchi-mx, skito, snagar, stalexan, steved, toracat
Target Milestone:	rc	Keywords:	ZStream
Target Release:	---
Hardware:	x86_64
OS:	All
Whiteboard:
Fixed In Version:	kernel-2.6.32-443.el6	Doc Type:	Bug Fix
Doc Text:	Systems that use NFS file systems could become unresponsive or trigger a kernel oops due to a use-after-free bug in the duplicate reply cache (DRC) code in the nfsd daemon. This problem has been resolved by modifying nfsd to unhash DRC entries before attempting to use them and to prefer to allocate a new DRC entry from the slab instead of reusing an expired entry from the list.	Story Points:	---
Clone Of:	1036971	Environment:
Last Closed:	2014-10-14 05:40:14 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1025907, 1036971
Bug Blocks:	994246, 1055911, 1088779

Description Jeff Layton 2013-12-03 02:14:37 UTC

+++ This bug was initially created as a clone of Bug #1036971 +++

+++ This bug was initially created as a clone of Bug #1025907 +++

Description of problem:
nfs system froze


Version-Release number of selected component (if applicable):

kernel-tools-libs-3.11.6-200.fc19.x86_64
kernel-headers-3.11.6-200.fc19.x86_64
kernel-3.11.3-201.fc19.x86_64
kernel-tools-3.11.6-200.fc19.x86_64
kernel-devel-3.11.6-200.fc19.x86_64
kernel-devel-3.11.1-200.fc19.x86_64
kernel-3.11.1-200.fc19.x86_64
kernel-devel-3.11.3-201.fc19.x86_64
kernel-3.11.6-200.fc19.x86_64

btrfs-progs-0.20.rc1.20130917git194aa4a-1.fc19.x86_64


How reproducible:
not sure what caused it.

Steps to Reproduce:
1.
2.
3.

Actual results:
freeze of system and dump:
Nov  1 09:00:22 r1epi kernel: [210852.907240] ------------[ cut here ]------------
Nov  1 09:00:22 r1epi kernel: [210852.907246] WARNING: CPU: 1 PID: 627 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
Nov  1 09:00:22 r1epi kernel: [210852.907247] list_del corruption, ffff8800af398010->next is LIST_POISON1 (dead000000100100)
Nov  1 09:00:22 r1epi kernel: [210852.907248] Modules linked in: binfmt_misc xt_iprange bonding ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter xt_conntrack nf_conntrack ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek iTCO_wdt iTCO_vendor_support btrfs zlib_deflate x86_pkg_temp_thermal coretemp kvm crc32_pclmul snd_hda_intel crc32c_intel raid6_pq btusb snd_hda_codec libcrc32c bluetooth ghash_clmulni_intel eeepc_wmi asus_wmi snd_hwdep sparse_keymap mxm_wmi snd_pcm microcode serio_raw xor rfkill i2c_i801 lpc_ich e1000 snd_page_alloc mfd_core e1000e mei_me snd_timer mei snd ptp soundcore shpchp pps_core wmi mperf nfsd auth_rpcgss nfs_acl lockd sunrpc i915 firewire_ohci i2c_algo_bit firewire_core drm_kms_helper crc_itu_t drm aacraid i2c_core video
Nov  1 09:00:22 r1epi kernel: [210852.907279] CPU: 1 PID: 627 Comm: nfsd Not tainted 3.11.6-200.fc19.x86_64 #1
Nov  1 09:00:22 r1epi kernel: [210852.907280] Hardware name: System manufacturer System Product Name/P8Z68-V PRO, BIOS 0902 09/19/2011
Nov  1 09:00:22 r1epi kernel: [210852.907281]  0000000000000009 ffff8801ff7bfcd8 ffffffff81647c7f ffff8801ff7bfd20
Nov  1 09:00:22 r1epi kernel: [210852.907283]  ffff8801ff7bfd10 ffffffff8106715d ffff8800af398010 0000000000000001
Nov  1 09:00:22 r1epi kernel: [210852.907285]  0000000000000000 0000000039b19996 ffff8801ff6a0000 ffff8801ff7bfd70
Nov  1 09:00:22 r1epi kernel: [210852.907287] Call Trace:
Nov  1 09:00:22 r1epi kernel: [210852.907291]  [<ffffffff81647c7f>] dump_stack+0x45/0x56
Nov  1 09:00:22 r1epi kernel: [210852.907294]  [<ffffffff8106715d>] warn_slowpath_common+0x7d/0xa0
Nov  1 09:00:22 r1epi kernel: [210852.907295]  [<ffffffff810671cc>] warn_slowpath_fmt+0x4c/0x50
Nov  1 09:00:22 r1epi kernel: [210852.907297]  [<ffffffff8118d80c>] ? kmem_cache_free+0x20c/0x220
Nov  1 09:00:22 r1epi kernel: [210852.907299]  [<ffffffff81310963>] __list_del_entry+0x63/0xd0
Nov  1 09:00:22 r1epi kernel: [210852.907306]  [<ffffffffa01c3cc1>] lru_put_end+0x21/0x60 [nfsd]
Nov  1 09:00:22 r1epi kernel: [210852.907310]  [<ffffffffa01c4208>] nfsd_cache_lookup+0x388/0x720 [nfsd]
Nov  1 09:00:22 r1epi kernel: [210852.907315]  [<ffffffffa01b8cf8>] nfsd_dispatch+0x88/0x200 [nfsd]
Nov  1 09:00:22 r1epi kernel: [210852.907323]  [<ffffffffa0187ffd>] svc_process_common+0x46d/0x6e0 [sunrpc]
Nov  1 09:00:22 r1epi kernel: [210852.907329]  [<ffffffffa0188377>] svc_process+0x107/0x170 [sunrpc]
Nov  1 09:00:22 r1epi kernel: [210852.907332]  [<ffffffffa01b871f>] nfsd+0xbf/0x130 [nfsd]
Nov  1 09:00:22 r1epi kernel: [210852.907336]  [<ffffffffa01b8660>] ? nfsd_destroy+0x80/0x80 [nfsd]
Nov  1 09:00:22 r1epi kernel: [210852.907338]  [<ffffffff81088650>] kthread+0xc0/0xd0
Nov  1 09:00:22 r1epi kernel: [210852.907340]  [<ffffffff81088590>] ? insert_kthread_work+0x40/0x40
Nov  1 09:00:22 r1epi kernel: [210852.907342]  [<ffffffff81656dec>] ret_from_fork+0x7c/0xb0
Nov  1 09:00:22 r1epi kernel: [210852.907344]  [<ffffffff81088590>] ? insert_kthread_work+0x40/0x40
Nov  1 09:00:22 r1epi kernel: [210852.907345] ---[ end trace 250b87d973d6ebb7 ]---


Expected results:

system stay up


Additional info:

file system is mounted with the lzo option.

--- Additional comment from g. artim on 2013-11-20 12:12:27 EST ---

added: 
+++++
noacl,nocto,nodiratime,noac,noatime
  to my nfs4 fstab and

ran:
+++

terminal 1:

(test.txt is 3GB file with random data)

#!/bin/bash
echo cp xtest1.txt
time cp xtest.txt xtest1.txt
echo cp xtest2.txt
time cp xtest.txt xtest2.txt
echo cp xtest3.txt
time cp xtest.txt xtest3.txt
echo cp xtest4.txt
time cp xtest.txt xtest4.txt
echo cp xtest5.txt
time cp xtest.txt xtest5.txt
sync
sync
echo rm xtest?.txt
time \rm xtest?.txt

terminal 2:

my response to 'watch -n 1 ls -la' on the
nfs4 directory I ran in went from
5-15 seconds to 1 second response.

this is a nfs4 client, but works as
well on nfs3 client.

the setup is:
++++++++++++

server:
fc19, btrfs w/lzo option on h/w raid
5, 20TB

client:
fc19, nfs4

so something is amiss between btrfs and nfs,
my guess its the nodiratime, but havent
isolated it down to that nfs option.


gary

--- Additional comment from J. Bruce Fields on 2013-11-20 16:04:08 EST ---

I guess this is the list_move in lru_put_end.  Could we in theory hit this case like this?:

        if (!list_empty(&lru_head)) {
                rp = list_first_entry(&lru_head, struct svc_cacherep, c_lru);
                if (nfsd_cache_entry_expired(rp) ||
                    num_drc_entries >= max_drc_entries) {
                        lru_put_end(rp);
                        prune_cache_entries(); /* prunes rp */
                        goto search_cache;
....
search_cache:
        found = nfsd_cache_search(rqstp, csum); /* returns NULL */
        if (found) {
                ....
        }
        if (!rp) {
                ....
        }
        ....
        lru_put_end(rp);

I don't see what guarantees rp is still good at this point, but I haven't looked closely.

--- Additional comment from Jeff Layton on 2013-11-20 16:10:27 EST ---

(In reply to J. Bruce Fields from comment #2)

> 
> I don't see what guarantees rp is still good at this point, but I haven't
> looked closely.

Been a while since I've been in this code too, but I think the fact that we hold the cache_lock over all of the above should ensure that rp doesn't go away out from under us.

--- Additional comment from Jeff Layton on 2013-11-20 16:24:59 EST ---

Ahh ok, I see where you're saying that prune_cache_entries might prune rp...

I think that's pretty unlikely. lru_put_end does this:

        rp->c_timestamp = jiffies;
        list_move_tail(&rp->c_lru, &lru_head);

...and then prune_cache_entries does this:

        list_for_each_entry_safe(rp, tmp, &lru_head, c_lru) {
                if (!nfsd_cache_entry_expired(rp) &&
                    num_drc_entries <= max_drc_entries)
                        break;
                nfsd_reply_cache_free_locked(rp);
                freed++;
        }

...so in order for rp to be pruned, you'd have to basically have max_drc_entries be 0, which I don't think is really possible.

--- Additional comment from Jeff Layton on 2013-11-20 16:32:21 EST ---

I'm stumped...is this reproducible at all? If so, would it be possible to get a vmcore?

--- Additional comment from Jeff Layton on 2013-11-20 16:39:33 EST ---

Hmm...so it looks like this is the found_entry case (aka, a cache hit):

Reading symbols from /usr/lib/debug/lib/modules/3.11.8-200.fc19.x86_64/kernel/fs/nfsd/nfsd.ko.debug...done.
(gdb) list *(nfsd_cache_lookup+0x388)
0xc208 is in nfsd_cache_lookup (fs/nfsd/nfscache.c:478).
473		age = jiffies - rp->c_timestamp;
474		lru_put_end(rp);
475	
476		rtn = RC_DROPIT;
477		/* Request being processed or excessive rexmits */
478		if (rp->c_state == RC_INPROG || age < RC_DELAY)
479			goto out;
480	
481		/* From the hall of fame of impractical attacks:
482		 * Is this a user who tries to snoop on the cache? */

The kernel version on my box is slightly different here of course, but I'll see if I can verify that against the version that the oops got reported against.

Still unclear on how we could hit such a bug, but I'll take a look tomorrow...

--- Additional comment from g. artim on 2013-11-20 17:03:57 EST ---

(In reply to Jeff Layton from comment #5)
> I'm stumped...is this reproducible at all? If so, would it be possible to
> get a vmcore?

since I updated to:

Linux r1epi 3.11.7-200.fc19.x86_64 #1 SMP Mon Nov 4 14:09:03 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

I havent got a dump of the kernel.

I only get the 10-15 second hangs on directory lists (ls -la). The options --
noacl,nocto,nodiratime,noac,noatime --
eliminated that hang time. 

Let me know if you want me to somehow force vmcore, but need some instructions. I do have a backup raid of the same size that I could test with. Its a different raid level (0), but I get the same results (long hang times on listing dirs when cp or rm of big files (3GB)).

--- Additional comment from Jeff Layton on 2013-11-21 08:51:17 EST ---

The list corruption is the more serious problem and the one we really need to address here. That may or may not be associated with the stalls you see. Have you seen the warnings about list corruption more than once?

There's no need to force a vmcore at this point.

--- Additional comment from Jeff Layton on 2013-11-21 09:13:57 EST ---

Ok, pulling down the kernel module that the list corruption was noticed on:

(gdb) list *(nfsd_cache_lookup+0x388)
0xc208 is in nfsd_cache_lookup (fs/nfsd/nfscache.c:478).

...and that's pretty much the same spot as in v3.11.8, so the lru_put_end call here is in the found_entry case. One possibility is that nfsd_cache_search returned an entry that was deleted.

Hmm...there is one place where we do free entries without holding the lock. nfsd_reply_cache_shutdown() will do so, but that should only be called if we couldn't successfully plug in the module in the first place or if we're unplugging it. Neither should be possible with nfsd threads actually running.

--- Additional comment from g. artim on 2013-11-21 12:54:26 EST ---

(In reply to Jeff Layton from comment #8)
> The list corruption is the more serious problem and the one we really need
> to address here. That may or may not be associated with the stalls you see.
> Have you seen the warnings about list corruption more than once?
> 
> There's no need to force a vmcore at this point.

only got it one time during a heavy load on the nfs server..that workload will return, but not sure when, the group that creates it are MIA now, holidays!

--- Additional comment from Jeff Layton on 2013-12-02 15:32:05 EST ---

Ok, I think this patch will probably fix it. Christoph Hellwig came up with a reproducer and I was able to track it down. I can't reproduce the bug with this patch, but I'm also hitting a deadlock of some sort in nfsd so I can't completely verify that just yet.

Still, I'm pretty confident that this is the bug that was causing the list corruption warnings. Hopefully this patch will make 3.13 and stable kernels.

Comment 2 Dave Wysochanski 2014-02-03 20:48:50 UTC

*** Bug 1055911 has been marked as a duplicate of this bug. ***

Comment 6 Rafael Aquini 2014-02-09 14:28:07 UTC

Patch(es) available on kernel-2.6.32-443.el6

Comment 18 Dr. Amihai Silverman 2014-05-19 11:42:26 UTC

(In reply to Rafael Aquini from comment #6)
> Patch(es) available on kernel-2.6.32-443.el6

Dear Rafael,
Could you please point us to the place where we can download kernel-2.6.32-443.el6 ? We suffer from the same NFS bug but we found in the RedHat archive only 2.6.32-431
Thanks, Amihai

Comment 19 Dr. Amihai Silverman 2014-05-19 11:43:10 UTC

Could you please point us to the place where we can download kernel-2.6.32-443.el6 ? we found in the RedHat archive only 2.6.32-431
Thanks, Amihai

Comment 23 errata-xmlrpc 2014-10-14 05:40:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-1392.html

Comment 24 Steve Dickson 2015-01-01 16:09:59 UTC

*** Bug 1096491 has been marked as a duplicate of this bug. ***