This service will be undergoing maintenance at 00:00 UTC, 2016-09-28. It is expected to last about 1 hours

Bug 132726

Summary: NFS/lockd: Badness in interruptible_sleep_on_timeout
Product: [Fedora] Fedora Reporter: Joe Orton <jorton>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: rawhideCC: andrew.grover, davej, jvdias, rdieter, redwolfe, tkmame, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.9-1.639 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-04 19:07:20 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 130887    
Attachments:
Description Flags
Upstream patch I propsed to fix this problem (which also applies nicely to RHEL4 kernels)
none
Updated patch none

Description Joe Orton 2004-09-16 07:59:34 EDT
2.6.8-1.541, i686, UP

Badness in interruptible_sleep_on_timeout at kernel/sched.c:3004
 [<022f7ba4>] interruptible_sleep_on_timeout+0x5d/0x23a
 [<0211b5bd>] default_wake_function+0x0/0xc
 [<22a4a9e6>] lockd_down+0xb4/0x258 [lockd]
 [<22a9c40a>] nfs_kill_super+0x43/0x63 [nfs]
 [<02167d56>] deactivate_super+0xcb/0xe0
 [<021848cc>] sys_umount+0x65/0x6c
 [<0217f60b>] destroy_inode+0x36/0x45
 [<0217b0a3>] dput+0x33/0x4f3
 [<021624c6>] __fput+0xc9/0xee
 [<021848de>] sys_oldumount+0xb/0xe
Comment 1 Jason Vas Dias 2004-09-17 13:12:44 EDT
 Here's some additional context for this bug.
 It always happens (100% reproducibility) during system
 shutdown (service nfs stop) on two systems with 
 kernel-2.6.8-1.541 installed - one, an SMP 2-processor
 P4 system, and the other a uniprocessor IBM Thinkpad P6
 laptop (I have no other systems with kernel-2.6.8-1.541).

 Once, the problem caused an 'Oops' - the kernel crashed -
 (ie I saw the Badness messages and then immediately after
 the Oops occurred ). I think I may have moved the mouse 
 when the Oops occurred - I haven't been able to duplicate.
 
 I see these messages in /var/log/messages (I have enabled
 kernel.* /var/log/messages in /etc/syslog.conf):

ntpd: ntpd shutdown succeeded
 kernel: Badness in interruptible_sleep_on_timeout at  kernel/sched.c:3004
 kernel:  [<022f7ba4>] interruptible_sleep_on_timeout+0x5d/0x23a
 kernel:  [<0211b5bd>] default_wake_function+0x0/0xc
 kernel:  [<22b159e6>] lockd_down+0xb4/0x258 [lockd]
 kernel:  [<22cc740a>] nfs_kill_super+0x43/0x63 [nfs]
 kernel:  [<02167d56>] deactivate_super+0xcb/0xe0
 kernel:  [<021848cc>] sys_umount+0x65/0x6c
 kernel:  [<0217b0a3>] dput+0x33/0x4f3
 kernel:  [<021624c6>] __fput+0xc9/0xee
 kernel:  [<02160d2c>] filp_close+0x59/0x5f
 netfs: Unmounting NFS filesystems:  succeeded
Comment 2 Jason Vas Dias 2004-09-21 10:51:24 EDT
This still happens in kernel-2.6.8-1.584 .
Comment 3 Jason Vas Dias 2004-09-21 12:12:19 EDT
I just had another "Oops" on rebooting kernel-2.6.8-1.584 .
I manually copied the call trace from the screen (without the
addresses):
Call Trace:
        disable_IO_APIC
        machine_restart
        sys_reboot
        handle_mm_fault
        do_page_fault
        destroy_inode
        dput
        __fput
        filp_close
Code: Bad EIP value.
Comment 4 Yao Zhang 2004-09-27 21:31:25 EDT
The same happens on my machine too.  I am running the lates rawhide
with 2.6.8-1.584 kernel.  It can always be reproduced by umount a NFS
share.  The NFS server runs RedHat 6.2.

The error message in the NFS client's /var/log/message:

Sep 27 21:22:30 water kernel: Badness in
interruptible_sleep_on_timeout at kernel/sched.c:3004
Sep 27 21:22:30 water kernel:  [<022ff2a8>]
interruptible_sleep_on_timeout+0x5d/0x23a
Sep 27 21:22:30 water kernel:  [<0211b869>] default_wake_function+0x0/0xc
Sep 27 21:22:30 water kernel:  [<12b0aa66>] lockd_down+0xb4/0x258 [lockd]
Sep 27 21:22:30 water kernel:  [<12b61439>] nfs_kill_super+0x43/0x63 [nfs]
Sep 27 21:22:30 water kernel:  [<0216a2aa>] deactivate_super+0xcb/0xe0
Sep 27 21:22:30 water kernel:  [<02186f18>] sys_umount+0x65/0x6c
Sep 27 21:22:30 water kernel:  [<02181c57>] destroy_inode+0x36/0x45
Sep 27 21:22:30 water kernel:  [<02108b37>] do_IRQ+0x2fd/0x309
Sep 27 21:22:30 water kernel:  [<02186f2a>] sys_oldumount+0xb/0xe
Comment 5 Jason Vas Dias 2004-10-07 15:23:24 EDT
 still happening in kernel-2.6.8-1.598
Comment 6 Vladimir Ivanovic 2004-10-09 20:55:23 EDT
kernel-smp-2.6.8-1.590

Oct  5 21:50:31 bach kernel: Badness in interruptible_sleep_on_timeout
at kernel/sched.c:3004
Oct  5 21:50:31 bach kernel:  [<022b4ebc>]
interruptible_sleep_on_timeout+0x5d/0xd0
Oct  5 21:50:31 bach kernel:  [<0211c983>] default_wake_function+0x0/0xc
Oct  5 21:50:31 bach kernel:  [<82b34fdb>] lockd_down+0xb3/0x10c [lockd]
Oct  5 21:50:31 bach kernel:  [<82c05e5c>] nfs_kill_super+0x43/0x63 [nfs]
Oct  5 21:50:31 bach kernel:  [<02157615>] deactivate_super+0x5b/0x70
Oct  5 21:50:31 bach kernel:  [<0216a26b>] sys_umount+0x65/0x6c
Oct  5 21:50:31 bach kernel:  [<02147bfd>] unmap_vma_list+0xe/0x17
Oct  5 21:50:31 bach kernel:  [<02147f64>] do_munmap+0x156/0x164
Oct  5 21:50:31 bach kernel:  [<0216a27d>] sys_oldumount+0xb/0xe
Oct  5 21:50:31 bach amd[3697]: /etc/amd.net unmounted fstype toplvl
from /net
Comment 7 Steve Dickson 2004-10-13 08:14:37 EDT
This is caused by a non-upstream patch that was added to RHEL4
which removed the holding of the BLK lock. When I send the patch 
upstream that removed the warnings,  it was strongly advised 
(in which I have to agree) not to remove the holding BLK lock. 
So I would suggest we remove the "removing of the BLK lock" patch.
Comment 8 Steve Dickson 2004-10-13 08:17:19 EDT
Created attachment 105137 [details]
Upstream patch I propsed to fix this problem (which also applies nicely to RHEL4 kernels)
Comment 9 Christopher Stone 2004-10-15 22:12:19 EDT
*** Bug 135622 has been marked as a duplicate of this bug. ***
Comment 10 G.Wolfe Woodbury 2004-10-16 01:34:37 EDT
This is still occurring in 2.6.8-1.624 kernel from rawhide 2004-10-14
Comment 11 Warren Togami 2004-10-18 22:43:32 EDT
According to davej, steved's patch fails to apply.  Please advise.
Comment 12 Steve Dickson 2004-10-19 08:03:18 EDT
Created attachment 105445 [details]
Updated patch
Comment 13 Bill Nottingham 2004-10-20 00:54:30 EDT
Added in 2.6.9-1.639.
Comment 14 Jason Vas Dias 2004-10-20 10:10:12 EDT
This problem appears to be fixed in 2.6.9-1.637 . 
Comment 15 Jason Baron 2004-10-21 13:31:10 EDT
*** Bug 136639 has been marked as a duplicate of this bug. ***