Bug 1089092

Summary: flock release on nfs4 triggers a might_sleep BUG message
Product: [Fedora] Fedora Reporter: Josh Stone <jistone>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 22CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mchehab, mguzik, nfs-maint, steved
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-20 20:03:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Josh Stone 2014-04-17 22:18:28 UTC
Description of problem:
When flock is taken and released on an nfs4 path, the kernel fails a might_sleep() check.

Version-Release number of selected component (if applicable):
kernel-3.15.0-0.rc1.git1.1.fc21.x86_64

I can also reproduce this with kernel-debug-3.13.9-100.fc19.x86_64

How reproducible:
100%

Steps to Reproduce:
# in bash with PWD on an nfs4 mount
1. exec 3>>foo   # prepare the fd
2. flock -x 3    # take an exclusive lock
3. exec 3>&-     # close the fd to drop the lock

Actual results:
BUG: sleeping function called from invalid context at mm/slub.c:969

Expected results:
No BUG.

Additional info:
Here's the full BUG info:

BUG: sleeping function called from invalid context at mm/slub.c:969
in_atomic(): 1, irqs_disabled(): 0, pid: 533, name: bash
3 locks held by bash/533:
 #0:  (&sp->so_delegreturn_mutex){+.+...}, at: [<ffffffffa033da62>] nfs4_proc_lock+0x262/0x910 [nfsv4]
 #1:  (&nfsi->rwsem){.+.+.+}, at: [<ffffffffa033da6a>] nfs4_proc_lock+0x26a/0x910 [nfsv4]
 #2:  (&sb->s_type->i_lock_key#23){+.+...}, at: [<ffffffff812998dc>] flock_lock_file_wait+0x8c/0x3a0
CPU: 0 PID: 533 Comm: bash Not tainted 3.15.0-0.rc1.git1.1.fc21.x86_64 #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 0000000000000000 00000000d664ff3c ffff880078b69a70 ffffffff817e82e0
 0000000000000000 ffff880078b69a98 ffffffff810cf1a4 0000000000000050
 0000000000000050 ffff88007cc01a00 ffff880078b69ad8 ffffffff8121449e
Call Trace:
 [<ffffffff817e82e0>] dump_stack+0x4d/0x66
 [<ffffffff810cf1a4>] __might_sleep+0x184/0x240
 [<ffffffff8121449e>] kmem_cache_alloc_trace+0x4e/0x330
 [<ffffffffa0331124>] ? nfs4_release_lockowner+0x74/0x110 [nfsv4]
 [<ffffffffa0331124>] nfs4_release_lockowner+0x74/0x110 [nfsv4]
 [<ffffffffa0352340>] nfs4_put_lock_state+0x90/0xb0 [nfsv4]
 [<ffffffffa0352375>] nfs4_fl_release_lock+0x15/0x20 [nfsv4]
 [<ffffffff81297515>] locks_free_lock+0x45/0x90
 [<ffffffff8129996c>] flock_lock_file_wait+0x11c/0x3a0
 [<ffffffffa033da6a>] ? nfs4_proc_lock+0x26a/0x910 [nfsv4]
 [<ffffffffa033301e>] do_vfs_lock+0x1e/0x30 [nfsv4]
 [<ffffffffa033da79>] nfs4_proc_lock+0x279/0x910 [nfsv4]
 [<ffffffff810dbb26>] ? local_clock+0x16/0x30
 [<ffffffff810f5a3f>] ? lock_release_holdtime.part.28+0xf/0x200
 [<ffffffffa02f820c>] do_unlk+0x8c/0xc0 [nfs]
 [<ffffffffa02f85c5>] nfs_flock+0xa5/0xf0 [nfs]
 [<ffffffff8129a6f6>] locks_remove_file+0xb6/0x1e0
 [<ffffffff812159d8>] ? kfree+0xd8/0x2d0
 [<ffffffff8123bc63>] __fput+0xd3/0x210
 [<ffffffff8123bdee>] ____fput+0xe/0x10
 [<ffffffff810bfb6d>] task_work_run+0xcd/0xf0
 [<ffffffff81019cd1>] do_notify_resume+0x61/0x90
 [<ffffffff817fbea2>] int_signal+0x12/0x17

Comment 1 Jeff Layton 2014-04-20 20:44:21 UTC
Definitely a bug. We're trying to do an allocation under a spinlock there. I'll have a look at how best to fix it.

Comment 2 Josh Stone 2014-06-13 17:34:08 UTC
This still occurs on 3.16.0-0.rc0.git5.1.fc21.x86_64

Comment 3 Jaroslav Reznik 2015-03-03 17:00:36 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 4 Mateusz Guzik 2015-03-04 11:48:47 UTC
I believe this was fixed with

commit ed9814d85810c27670987b40c77e8a07105838fe
Author: Jeff Layton <jlayton>
Date:   Mon Aug 11 14:20:31 2014 -0400

    locks: defer freeing locks in locks_delete_lock until after i_lock has been dropped


and following commits.

Comment 5 Josh Stone 2015-03-04 17:17:56 UTC
Ok, that commit is v3.17~229^2~2, and here on 3.18.7-200.fc21.x86_64 I can run the original reproducer with no BUG.

Comment 6 Justin M. Forbes 2015-10-20 19:43:23 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs.

Fedora 22 has now been rebased to 4.2.3-200.fc22.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23.

If you experience different issues, please open a new bug report for those.

Comment 7 Jeff Layton 2015-10-20 20:03:22 UTC
Closing as fixed!