Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 746861

Summary:	umount of RHEL 6.2 2.6.32-209.el6.x86_64 beta pNFS share can hang or cause Oops
Product:	Red Hat Enterprise Linux 6	Reporter:	Andy Adamson <andros>
Component:	kernel	Assignee:	Steve Dickson <steved>
Status:	CLOSED ERRATA	QA Contact:	Filesystem QE <fs-qe>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6.2	CC:	cward, eguan, kzhang, pbenas, rwheeler, steved
Target Milestone:	rc	Keywords:	OtherQA
Target Release:	---
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	kernel-2.6.32-214.el6	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2011-12-06 14:18:03 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	750914

Description Andy Adamson 2011-10-18 02:19:45 UTC

Description of problem:

Bug in nfs4_deviceid_purge_client that is fixed in 3.0-rc5 commit 9e3bd4e24

Pid: 2731, comm: umount.nfs Not tainted 2.6.32-209.el6.x86_64 #1 VMware, Inc. VM
ware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffffa053bab8>]  [<ffffffffa053bab8>] nfs4_deviceid_purge_client+
0xe8/0x170 [nfs]
RSP: 0018:ffff88006a243dc8  EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffff88006a243e08 RCX: 0000000000000050
RDX: ffff880066584a50 RSI: ffffffffa00f0c70 RDI: 0000000000000282
RBP: ffffffff8100bc0e R08: ffff88006a243d10 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88006a243d78
R13: ffff88006a243d58 R14: 0000000000000282 R15: dead000000200200
FS:  00007fbc5093d700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f1d300949d0 CR3: 0000000053bca000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount.nfs (pid: 2731, threadinfo ffff88006a242000, task ffff88006a37f50
0)
Stack:
ffff880066584a50 ffffffff81c00140 ffff88006a152400 ffff880069e7e000
<0> ffff880069e7e000 ffffffff81c00140 ffff88006a152400 ffff8800378ab9c0
<0> ffff88006a243e28 ffffffffa04fda3a ffffffff81c00140 ffff880069e7e000
Call Trace:
[<ffffffffa04fda3a>] ? nfs_free_client+0x9a/0x120 [nfs]
[<ffffffffa04fe04b>] ? nfs_put_client+0x7b/0xb0 [nfs]
[<ffffffffa04fe143>] ? nfs_free_server+0xc3/0x130 [nfs]
[<ffffffffa050b3a9>] ? nfs4_kill_super+0x49/0x90 [nfs] [<ffffffff81179650>] ? deactivate_super+0x70/0x90 [<ffffffff811955cf>] ? mntput_no_expire+0xbf/0x110
[<ffffffff8119606b>] ? sys_umount+0x7b/0x3a0
[<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
Code: 00 00 00 4c 89 f8 c7 00 00 00 00 00 48 83 7d c0 00 74 70 e8 9b 18 b5 e0 49
8d 5c 24 48 eb 0e 0f 1f 40 00 49 8b 44 24 18 48 85 c0 <75> 26 48 83 7d c0 00 74
4f f0 ff 0b 0f 94 c0 84 c0 74 e5 49 8b
Call Trace:
[<ffffffffa053bad6>] ? nfs4_deviceid_purge_client+0x106/0x170 [nfs]
[<ffffffffa04fda3a>] ? nfs_free_client+0x9a/0x120 [nfs]
[<ffffffffa04fe04b>] ? nfs_put_client+0x7b/0xb0 [nfs]
[<ffffffffa04fe143>] ? nfs_free_server+0xc3/0x130 [nfs]
[<ffffffffa050b3a9>] ? nfs4_kill_super+0x49/0x90 [nfs]
[<ffffffff81179650>] ? deactivate_super+0x70/0x90
[<ffffffff811955cf>] ? mntput_no_expire+0xbf/0x110
[<ffffffff8119606b>] ? sys_umount+0x7b/0x3a0
[<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
BUG: unable to handle kernel NULL pointer dereference at 0000000000000068IP: [<ffffffffa053bad3>] nfs4_deviceid_purge_client+0x103/0x170 [nfs]PGD 0Oops: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/runCPU 1Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache nfs_acl auth_rpcgss nls_utf8 fuse ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_d
efrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter
Got error -10052 from the server on DESTROY_SESSION. Session has been destroyed regardless... ip6_tables ipv6 vhost_net macvtap macvlan tun uinput ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon sg i2c_piix4 i2c_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom mptspi mptscsi
h mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_
hash dm_log dm_mod [last unloaded: speedstep_lib]
Pid: 2731, comm: umount.nfs Not tainted 2.6.32-209.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffffa053bad3>]  [<ffffffffa053bad3>] nfs4_deviceid_purge_client+
0x103/0x170 [nfs]
RSP: 0018:ffff88006a243dc8  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff880066584e08 RCX: 0000000000000050
RDX: ffff880066584a50 RSI: ffffffffa00f0c70 RDI: ffff880066584dc0
RBP: ffff88006a243e08 R08: ffff88006a243d10 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880066584dc0
R13: ffff880069e7e000 R14: ffffffffa054e5e0 R15: ffffffffa054e5c0
FS:  00007fbc5093d700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fc6d59e7000 CR3: 0000000053bca000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount.nfs (pid: 2731, threadinfo ffff88006a242000, task ffff88006a37f50
0)
Stack:
ffff880066584a50 ffffffff81c00140 ffff88006a152400 ffff880069e7e000
<0> ffff880069e7e000 ffffffff81c00140 ffff88006a152400 ffff8800378ab9c0
<0> ffff88006a243e28 ffffffffa04fda3a ffffffff81c00140 ffff880069e7e000
Call Trace:
[<ffffffffa04fda3a>] nfs_free_client+0x9a/0x120 [nfs]
[<ffffffffa04fe04b>] nfs_put_client+0x7b/0xb0 [nfs]
[<ffffffffa04fe143>] nfs_free_server+0xc3/0x130 [nfs]
[<ffffffffa050b3a9>] nfs4_kill_super+0x49/0x90 [nfs]
[<ffffffff81179650>] deactivate_super+0x70/0x90
[<ffffffff811955cf>] mntput_no_expire+0xbf/0x110
[<ffffffff8119606b>] sys_umount+0x7b/0x3a0 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1bCode: 24 48 eb 0e 0f 1f 40 00 49 8b 44 24 18 48 85 c0 75 26 48 83 7d c0 00 74 4f f0 ff 0b 0f 94 c0 84 c0 74 e5 49 8b 44 24 20 4c 89 e7 <ff> 50 68 49 8b 44 24 18 48 85 c0 74 da 49 8b 54 24 10 48 85 d2
RIP  [<ffffffffa053bad3>] nfs4_deviceid_purge_client+0x103/0x170 [nfs]
RSP <ffff88006a243dc8>
CR2: 0000000000000068
---[ end trace 7afe685c8e44198a ]---
Kernel panic - not syncing: Fatal exception
Pid: 2731, comm: umount.nfs Tainted: G      D    ----------------   2.6.32-209.el6.x86_64 #1
Call Trace:
[<ffffffff814ebd7b>] ? panic+0x78/0x143
[<ffffffff814eff14>] ? oops_end+0xe4/0x100
[<ffffffff810422eb>] ? no_context+0xfb/0x260
[<ffffffff81042575>] ? __bad_area_nosemaphore+0x125/0x1e0
[<ffffffff8104269e>] ? bad_area+0x4e/0x60
[<ffffffff81042da3>] ? __do_page_fault+0x3c3/0x480
[<ffffffff814ed305>] ? schedule_timeout+0x215/0x2e0
[<ffffffff814eef5b>] ? _spin_unlock_bh+0x1b/0x20
[<ffffffff814f1ece>] ? do_page_fault+0x3e/0xa0
[<ffffffff814ef285>] ? page_fault+0x25/0x30
[<ffffffffa053bad3>] ? nfs4_deviceid_purge_client+0x103/0x170 [nfs]
[<ffffffffa053bad6>] ? nfs4_deviceid_purge_client+0x106/0x170 [nfs]
[<ffffffffa04fda3a>] ? nfs_free_client+0x9a/0x120 [nfs]
[<ffffffffa04fe04b>] ? nfs_put_client+0x7b/0xb0 [nfs]
[<ffffffffa04fe143>] ? nfs_free_server+0xc3/0x130 [nfs]
[<ffffffffa050b3a9>] ? nfs4_kill_super+0x49/0x90 [nfs]
[<ffffffff81179650>] ? deactivate_super+0x70/0x90
[<ffffffff811955cf>] ? mntput_no_expire+0xbf/0x110
[<ffffffff8119606b>] ? sys_umount+0x7b/0x3a0
[<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b



How reproducible:

Very.


Steps to Reproduce:
1. Run connectathon Special tests on a pNFS mount
2. umount
3.
  
Actual results:

umount hangs or Oops


Expected results:

umount succeeds


Additional info:

Here is the broken code:

static void
_deviceid_purge_client(const struct nfs_client *clp, long hash)
{
  .......

       while (!hlist_empty(&tmp)) {
               if (atomic_dec_and_test(&d->ref))
                       d->ld->free_deviceid_node(d);
               hlist_del_init(&d->tmpnode);
       }
}


Here is the fixed code.

static void
_deviceid_purge_client(const struct nfs_client *clp, long hash)
{
       
........

       while (!hlist_empty(&tmp)) {
               d = hlist_entry(tmp.first, struct nfs4_deviceid_node, tmpnode);
               hlist_del(&d->tmpnode);
               if (atomic_dec_and_test(&d->ref))
                       d->ld->free_deviceid_node(d);
       }
}

Comment 2 RHEL Program Management 2011-10-18 18:10:55 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 4 Steve Dickson 2011-10-19 14:26:29 UTC

Posted patch:

From: Andy Adamson <andros>
Date: Wed, 19 Oct 2011 10:47:43 -0400
Subject: [RHEL6.2 PATCH 1/1] pNFS can hang or oops on umounts.

This fix is part of the upstream commit 9e3bd4e24 that
went into 3.0-rc5. The patch fixes an oops that can occur
after the connectathon special tests are run on an
pNFS mount and then an umount is done.

Signed-off-by: Steve Dickson <steved>
BZ: https://bugzilla.redhat.com/show_bug.cgi?id=746861
---
 fs/nfs/pnfs_dev.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c
index bee94a3..005e82d 100644
--- a/fs/nfs/pnfs_dev.c
+++ b/fs/nfs/pnfs_dev.c
@@ -239,9 +239,10 @@ _deviceid_purge_client(const struct nfs_client *clp, long hash)

 	synchronize_rcu();
 	while (!hlist_empty(&tmp)) {
+		d = hlist_entry(tmp.first, struct nfs4_deviceid_node, tmpnode);
+		hlist_del(&d->tmpnode);
 		if (atomic_dec_and_test(&d->ref))
 			d->ld->free_deviceid_node(d);
-		hlist_del_init(&d->tmpnode);
 	}
 }

Comment 6 Eryu Guan 2011-10-24 08:30:54 UTC

Hi Andy,

Will NetApp verify the fix once a test kernel is available?

Thanks!
Eryu Guan

Comment 8 Steve Dickson 2011-10-24 15:53:54 UTC

(In reply to comment #6)
> Hi Andy,
> 
> Will NetApp verify the fix once a test kernel is available?

I just talked to Andy and he said this patch was verified 
at that this year's Bakathon (which happen last week).

Comment 9 Aristeu Rozanski 2011-10-26 19:46:25 UTC

Patch(es) available on kernel-2.6.32-214.el6

Comment 13 errata-xmlrpc 2011-12-06 14:18:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html