| Summary: | umount of RHEL 6.2 2.6.32-209.el6.x86_64 beta pNFS share can hang or cause Oops | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Andy Adamson <andros> |
| Component: | kernel | Assignee: | Steve Dickson <steved> |
| Status: | CLOSED ERRATA | QA Contact: | Filesystem QE <fs-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.2 | CC: | cward, eguan, kzhang, pbenas, rwheeler, steved |
| Target Milestone: | rc | Keywords: | OtherQA |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | kernel-2.6.32-214.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-12-06 14:18:03 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Bug Depends On: | |||
| Bug Blocks: | 750914 | ||
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Posted patch: From: Andy Adamson <andros> Date: Wed, 19 Oct 2011 10:47:43 -0400 Subject: [RHEL6.2 PATCH 1/1] pNFS can hang or oops on umounts. This fix is part of the upstream commit 9e3bd4e24 that went into 3.0-rc5. The patch fixes an oops that can occur after the connectathon special tests are run on an pNFS mount and then an umount is done. Signed-off-by: Steve Dickson <steved> BZ: https://bugzilla.redhat.com/show_bug.cgi?id=746861 --- fs/nfs/pnfs_dev.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/nfs/pnfs_dev.c b/fs/nfs/pnfs_dev.c index bee94a3..005e82d 100644 --- a/fs/nfs/pnfs_dev.c +++ b/fs/nfs/pnfs_dev.c @@ -239,9 +239,10 @@ _deviceid_purge_client(const struct nfs_client *clp, long hash) synchronize_rcu(); while (!hlist_empty(&tmp)) { + d = hlist_entry(tmp.first, struct nfs4_deviceid_node, tmpnode); + hlist_del(&d->tmpnode); if (atomic_dec_and_test(&d->ref)) d->ld->free_deviceid_node(d); - hlist_del_init(&d->tmpnode); } } Hi Andy, Will NetApp verify the fix once a test kernel is available? Thanks! Eryu Guan (In reply to comment #6) > Hi Andy, > > Will NetApp verify the fix once a test kernel is available? I just talked to Andy and he said this patch was verified at that this year's Bakathon (which happen last week). Patch(es) available on kernel-2.6.32-214.el6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2011-1530.html |
Description of problem: Bug in nfs4_deviceid_purge_client that is fixed in 3.0-rc5 commit 9e3bd4e24 Pid: 2731, comm: umount.nfs Not tainted 2.6.32-209.el6.x86_64 #1 VMware, Inc. VM ware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa053bab8>] [<ffffffffa053bab8>] nfs4_deviceid_purge_client+ 0xe8/0x170 [nfs] RSP: 0018:ffff88006a243dc8 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff88006a243e08 RCX: 0000000000000050 RDX: ffff880066584a50 RSI: ffffffffa00f0c70 RDI: 0000000000000282 RBP: ffffffff8100bc0e R08: ffff88006a243d10 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88006a243d78 R13: ffff88006a243d58 R14: 0000000000000282 R15: dead000000200200 FS: 00007fbc5093d700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f1d300949d0 CR3: 0000000053bca000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount.nfs (pid: 2731, threadinfo ffff88006a242000, task ffff88006a37f50 0) Stack: ffff880066584a50 ffffffff81c00140 ffff88006a152400 ffff880069e7e000 <0> ffff880069e7e000 ffffffff81c00140 ffff88006a152400 ffff8800378ab9c0 <0> ffff88006a243e28 ffffffffa04fda3a ffffffff81c00140 ffff880069e7e000 Call Trace: [<ffffffffa04fda3a>] ? nfs_free_client+0x9a/0x120 [nfs] [<ffffffffa04fe04b>] ? nfs_put_client+0x7b/0xb0 [nfs] [<ffffffffa04fe143>] ? nfs_free_server+0xc3/0x130 [nfs] [<ffffffffa050b3a9>] ? nfs4_kill_super+0x49/0x90 [nfs] [<ffffffff81179650>] ? deactivate_super+0x70/0x90 [<ffffffff811955cf>] ? mntput_no_expire+0xbf/0x110 [<ffffffff8119606b>] ? sys_umount+0x7b/0x3a0 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b Code: 00 00 00 4c 89 f8 c7 00 00 00 00 00 48 83 7d c0 00 74 70 e8 9b 18 b5 e0 49 8d 5c 24 48 eb 0e 0f 1f 40 00 49 8b 44 24 18 48 85 c0 <75> 26 48 83 7d c0 00 74 4f f0 ff 0b 0f 94 c0 84 c0 74 e5 49 8b Call Trace: [<ffffffffa053bad6>] ? nfs4_deviceid_purge_client+0x106/0x170 [nfs] [<ffffffffa04fda3a>] ? nfs_free_client+0x9a/0x120 [nfs] [<ffffffffa04fe04b>] ? nfs_put_client+0x7b/0xb0 [nfs] [<ffffffffa04fe143>] ? nfs_free_server+0xc3/0x130 [nfs] [<ffffffffa050b3a9>] ? nfs4_kill_super+0x49/0x90 [nfs] [<ffffffff81179650>] ? deactivate_super+0x70/0x90 [<ffffffff811955cf>] ? mntput_no_expire+0xbf/0x110 [<ffffffff8119606b>] ? sys_umount+0x7b/0x3a0 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b BUG: unable to handle kernel NULL pointer dereference at 0000000000000068IP: [<ffffffffa053bad3>] nfs4_deviceid_purge_client+0x103/0x170 [nfs]PGD 0Oops: 0000 [#1] SMP last sysfs file: /sys/kernel/mm/ksm/runCPU 1Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache nfs_acl auth_rpcgss nls_utf8 fuse ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_d efrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter Got error -10052 from the server on DESTROY_SESSION. Session has been destroyed regardless... ip6_tables ipv6 vhost_net macvtap macvlan tun uinput ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon sg i2c_piix4 i2c_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom mptspi mptscsi h mptbase scsi_transport_spi pata_acpi ata_generic ata_piix dm_mirror dm_region_ hash dm_log dm_mod [last unloaded: speedstep_lib] Pid: 2731, comm: umount.nfs Not tainted 2.6.32-209.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa053bad3>] [<ffffffffa053bad3>] nfs4_deviceid_purge_client+ 0x103/0x170 [nfs] RSP: 0018:ffff88006a243dc8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff880066584e08 RCX: 0000000000000050 RDX: ffff880066584a50 RSI: ffffffffa00f0c70 RDI: ffff880066584dc0 RBP: ffff88006a243e08 R08: ffff88006a243d10 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880066584dc0 R13: ffff880069e7e000 R14: ffffffffa054e5e0 R15: ffffffffa054e5c0 FS: 00007fbc5093d700(0000) GS:ffff880002300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fc6d59e7000 CR3: 0000000053bca000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount.nfs (pid: 2731, threadinfo ffff88006a242000, task ffff88006a37f50 0) Stack: ffff880066584a50 ffffffff81c00140 ffff88006a152400 ffff880069e7e000 <0> ffff880069e7e000 ffffffff81c00140 ffff88006a152400 ffff8800378ab9c0 <0> ffff88006a243e28 ffffffffa04fda3a ffffffff81c00140 ffff880069e7e000 Call Trace: [<ffffffffa04fda3a>] nfs_free_client+0x9a/0x120 [nfs] [<ffffffffa04fe04b>] nfs_put_client+0x7b/0xb0 [nfs] [<ffffffffa04fe143>] nfs_free_server+0xc3/0x130 [nfs] [<ffffffffa050b3a9>] nfs4_kill_super+0x49/0x90 [nfs] [<ffffffff81179650>] deactivate_super+0x70/0x90 [<ffffffff811955cf>] mntput_no_expire+0xbf/0x110 [<ffffffff8119606b>] sys_umount+0x7b/0x3a0 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1bCode: 24 48 eb 0e 0f 1f 40 00 49 8b 44 24 18 48 85 c0 75 26 48 83 7d c0 00 74 4f f0 ff 0b 0f 94 c0 84 c0 74 e5 49 8b 44 24 20 4c 89 e7 <ff> 50 68 49 8b 44 24 18 48 85 c0 74 da 49 8b 54 24 10 48 85 d2 RIP [<ffffffffa053bad3>] nfs4_deviceid_purge_client+0x103/0x170 [nfs] RSP <ffff88006a243dc8> CR2: 0000000000000068 ---[ end trace 7afe685c8e44198a ]--- Kernel panic - not syncing: Fatal exception Pid: 2731, comm: umount.nfs Tainted: G D ---------------- 2.6.32-209.el6.x86_64 #1 Call Trace: [<ffffffff814ebd7b>] ? panic+0x78/0x143 [<ffffffff814eff14>] ? oops_end+0xe4/0x100 [<ffffffff810422eb>] ? no_context+0xfb/0x260 [<ffffffff81042575>] ? __bad_area_nosemaphore+0x125/0x1e0 [<ffffffff8104269e>] ? bad_area+0x4e/0x60 [<ffffffff81042da3>] ? __do_page_fault+0x3c3/0x480 [<ffffffff814ed305>] ? schedule_timeout+0x215/0x2e0 [<ffffffff814eef5b>] ? _spin_unlock_bh+0x1b/0x20 [<ffffffff814f1ece>] ? do_page_fault+0x3e/0xa0 [<ffffffff814ef285>] ? page_fault+0x25/0x30 [<ffffffffa053bad3>] ? nfs4_deviceid_purge_client+0x103/0x170 [nfs] [<ffffffffa053bad6>] ? nfs4_deviceid_purge_client+0x106/0x170 [nfs] [<ffffffffa04fda3a>] ? nfs_free_client+0x9a/0x120 [nfs] [<ffffffffa04fe04b>] ? nfs_put_client+0x7b/0xb0 [nfs] [<ffffffffa04fe143>] ? nfs_free_server+0xc3/0x130 [nfs] [<ffffffffa050b3a9>] ? nfs4_kill_super+0x49/0x90 [nfs] [<ffffffff81179650>] ? deactivate_super+0x70/0x90 [<ffffffff811955cf>] ? mntput_no_expire+0xbf/0x110 [<ffffffff8119606b>] ? sys_umount+0x7b/0x3a0 [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b How reproducible: Very. Steps to Reproduce: 1. Run connectathon Special tests on a pNFS mount 2. umount 3. Actual results: umount hangs or Oops Expected results: umount succeeds Additional info: Here is the broken code: static void _deviceid_purge_client(const struct nfs_client *clp, long hash) { ....... while (!hlist_empty(&tmp)) { if (atomic_dec_and_test(&d->ref)) d->ld->free_deviceid_node(d); hlist_del_init(&d->tmpnode); } } Here is the fixed code. static void _deviceid_purge_client(const struct nfs_client *clp, long hash) { ........ while (!hlist_empty(&tmp)) { d = hlist_entry(tmp.first, struct nfs4_deviceid_node, tmpnode); hlist_del(&d->tmpnode); if (atomic_dec_and_test(&d->ref)) d->ld->free_deviceid_node(d); } }