Bug 696376

Summary: server BUG() on receipt of bad NFSv4 lock request
Product: Red Hat Enterprise Linux 6 Reporter: J. Bruce Fields <bfields>
Component: kernelAssignee: J. Bruce Fields <bfields>
kernel sub component: NFS QA Contact: Filesystem QE <fs-qe>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: unspecified CC: cmarthal, klaus.steinberger, kzhang, liko, mzywusko, rwheeler, syeghiay
Version: 6.1Keywords: Regression
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-131.0.5.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 12:42:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description J. Bruce Fields 2011-04-13 22:49:47 UTC
Certain lock failures (e.g. due to receipt of lock request during grace period) will cause a BUG() like:

kernel BUG at fs/nfsd/nfs4state.c:390!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.1/irq
CPU 12 
Modules linked in: nfs fscache(T) nfsd lockd nfs_acl auth_rpcgss exportfs ext4 ]

Modules linked in: nfs fscache(T) nfsd lockd nfs_acl auth_rpcgss exportfs ext4 ]
Pid: 3309, comm: nfsd Tainted: G           ---------------- T 2.6.32-130.el6.x85
RIP: 0010:[<ffffffffa038b2b5>]  [<ffffffffa038b2b5>] free_generic_stateid+0x35/]
RSP: 0018:ffff8802305a3b00  EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff88043fc91740 RCX: ffff8802305a3ae8
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8802305a3b0c
RBP: ffff8802305a3b20 R08: ffff88043fc91760 R09: 0000000000000000
R10: 000000000000003c R11: 0000000000000000 R12: ffff8804365f6280
R13: ffff8804365f62b8 R14: ffff8804365f6280 R15: ffff88043fc917b8
FS:  00007f6e57d44700(0000) GS:ffff880247440000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f6e573f4550 CR3: 0000000001a25000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsd (pid: 3309, threadinfo ffff8802305a2000, task ffff88023061a080)
Stack:
 ffff8802305a3b20 00000000a0386e88 ffff88043fc91740 ffff8804365f6280
<0> ffff8802305a3b50 ffffffffa038b389 0000000000000000 ffff88082f3af1a0
<0> 000000001d270000 ffff88082f3b0040 ffff8802305a3d80 ffffffffa038ba5d
Call Trace:
 [<ffffffffa038b389>] release_lockowner+0x59/0xb0 [nfsd]
 [<ffffffffa038ba5d>] nfsd4_lock+0x4cd/0x7e0 [nfsd]
 [<ffffffffa0375a06>] ? nfsd_setuser+0x126/0x2c0 [nfsd]
 [<ffffffffa036d852>] ? nfsd_setuser_and_check_port+0x62/0xb0 [nfsd]
 [<ffffffffa036da07>] ? fh_verify+0x167/0x650 [nfsd]
 [<ffffffffa037cf01>] nfsd4_proc_compound+0x3d1/0x490 [nfsd]
 [<ffffffffa036a43e>] nfsd_dispatch+0xfe/0x240 [nfsd]
 [<ffffffffa02634d4>] svc_process_common+0x344/0x640 [sunrpc]
 [<ffffffff8105d710>] ? default_wake_function+0x0/0x20
 [<ffffffffa0263b10>] svc_process+0x110/0x160 [sunrpc]
 [<ffffffffa036ab62>] nfsd+0xc2/0x160 [nfsd]
 [<ffffffffa036aaa0>] ? nfsd+0x0/0x160 [nfsd]
 [<ffffffff8108de16>] kthread+0x96/0xa0
 [<ffffffff8100c1ca>] child_rip+0xa/0x20
 [<ffffffff8108dd80>] ? kthread+0x0/0xa0
 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
Code: 10 0f 1f 44 00 00 48 8b 77 60 48 89 fb 48 8d 7d ec e8 b0 c1 ff ff 8b 45 e 
RIP  [<ffffffffa038b2b5>] free_generic_stateid+0x35/0xb0 [nfsd]
 RSP <ffff8802305a3b00>

Comment 1 J. Bruce Fields 2011-04-13 22:51:50 UTC
Fix commited upstream as 23fcf2ec93fb8573a653408316af599939ff9a8e

Comment 3 J. Bruce Fields 2011-04-13 23:02:00 UTC
Simplest reproducer I've found is

a) open a file

b) restart the server

c) get a lock on the open file descriptor while the server is still in its grace period (so within 90 seconds of step b).

Comment 4 RHEL Program Management 2011-04-14 16:19:29 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 6 J. Bruce Fields 2011-04-15 16:55:20 UTC
*** Bug 697032 has been marked as a duplicate of this bug. ***

Comment 7 Aristeu Rozanski 2011-04-20 22:06:00 UTC
Patch(es) available on kernel-2.6.32-131.0.5.el6

Comment 10 Nate Straz 2011-04-26 13:29:39 UTC
I ran into this during cluster relocation tests while running a -132 based kernel.  I re-ran it with kernel-2.6.32-131.0.5.el6.x86_64 and made it through the relocation tests.

Is there a clone to make sure this patch makes it into 6.2?

Comment 11 J. Bruce Fields 2011-04-26 15:28:52 UTC
It's in as of kernel-2.6.32-134.el6.  As I understand it, that means it should be headed for both 6.1 and 6.2 already.

Comment 12 Nate Straz 2011-04-26 18:49:48 UTC
Sounds good, I'll mark this verified for 6.1.

Comment 13 errata-xmlrpc 2011-05-19 12:42:50 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html

Comment 14 Yongcheng Yang 2019-11-27 06:19:44 UTC
(In reply to J. Bruce Fields from comment #3)
> Simplest reproducer I've found is:
> a) open a file
> b) restart the server
> c) get a lock on the open file descriptor while the server is still in its grace period (so within 90 seconds of step b).

This scenario has been covered by many tests under /kernel/filesystems/nfs/function/nfslock/ already.