Red Hat Bugzilla – Bug 696376
server BUG() on receipt of bad NFSv4 lock request
Last modified: 2011-12-10 01:01:56 EST
Certain lock failures (e.g. due to receipt of lock request during grace period) will cause a BUG() like: kernel BUG at fs/nfsd/nfs4state.c:390! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.1/irq CPU 12 Modules linked in: nfs fscache(T) nfsd lockd nfs_acl auth_rpcgss exportfs ext4 ] Modules linked in: nfs fscache(T) nfsd lockd nfs_acl auth_rpcgss exportfs ext4 ] Pid: 3309, comm: nfsd Tainted: G ---------------- T 2.6.32-130.el6.x85 RIP: 0010:[<ffffffffa038b2b5>] [<ffffffffa038b2b5>] free_generic_stateid+0x35/] RSP: 0018:ffff8802305a3b00 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff88043fc91740 RCX: ffff8802305a3ae8 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8802305a3b0c RBP: ffff8802305a3b20 R08: ffff88043fc91760 R09: 0000000000000000 R10: 000000000000003c R11: 0000000000000000 R12: ffff8804365f6280 R13: ffff8804365f62b8 R14: ffff8804365f6280 R15: ffff88043fc917b8 FS: 00007f6e57d44700(0000) GS:ffff880247440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00007f6e573f4550 CR3: 0000000001a25000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process nfsd (pid: 3309, threadinfo ffff8802305a2000, task ffff88023061a080) Stack: ffff8802305a3b20 00000000a0386e88 ffff88043fc91740 ffff8804365f6280 <0> ffff8802305a3b50 ffffffffa038b389 0000000000000000 ffff88082f3af1a0 <0> 000000001d270000 ffff88082f3b0040 ffff8802305a3d80 ffffffffa038ba5d Call Trace: [<ffffffffa038b389>] release_lockowner+0x59/0xb0 [nfsd] [<ffffffffa038ba5d>] nfsd4_lock+0x4cd/0x7e0 [nfsd] [<ffffffffa0375a06>] ? nfsd_setuser+0x126/0x2c0 [nfsd] [<ffffffffa036d852>] ? nfsd_setuser_and_check_port+0x62/0xb0 [nfsd] [<ffffffffa036da07>] ? fh_verify+0x167/0x650 [nfsd] [<ffffffffa037cf01>] nfsd4_proc_compound+0x3d1/0x490 [nfsd] [<ffffffffa036a43e>] nfsd_dispatch+0xfe/0x240 [nfsd] [<ffffffffa02634d4>] svc_process_common+0x344/0x640 [sunrpc] [<ffffffff8105d710>] ? default_wake_function+0x0/0x20 [<ffffffffa0263b10>] svc_process+0x110/0x160 [sunrpc] [<ffffffffa036ab62>] nfsd+0xc2/0x160 [nfsd] [<ffffffffa036aaa0>] ? nfsd+0x0/0x160 [nfsd] [<ffffffff8108de16>] kthread+0x96/0xa0 [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffff8108dd80>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Code: 10 0f 1f 44 00 00 48 8b 77 60 48 89 fb 48 8d 7d ec e8 b0 c1 ff ff 8b 45 e RIP [<ffffffffa038b2b5>] free_generic_stateid+0x35/0xb0 [nfsd] RSP <ffff8802305a3b00>
Fix commited upstream as 23fcf2ec93fb8573a653408316af599939ff9a8e
Simplest reproducer I've found is a) open a file b) restart the server c) get a lock on the open file descriptor while the server is still in its grace period (so within 90 seconds of step b).
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
*** Bug 697032 has been marked as a duplicate of this bug. ***
Patch(es) available on kernel-2.6.32-131.0.5.el6
I ran into this during cluster relocation tests while running a -132 based kernel. I re-ran it with kernel-2.6.32-131.0.5.el6.x86_64 and made it through the relocation tests. Is there a clone to make sure this patch makes it into 6.2?
It's in as of kernel-2.6.32-134.el6. As I understand it, that means it should be headed for both 6.1 and 6.2 already.
Sounds good, I'll mark this verified for 6.1.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0542.html