Bug 132726 - NFS/lockd: Badness in interruptible_sleep_on_timeout
Summary: NFS/lockd: Badness in interruptible_sleep_on_timeout
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
high
high
Target Milestone: ---
Assignee: Steve Dickson
QA Contact:
URL:
Whiteboard:
: 135622 136639 (view as bug list)
Depends On:
Blocks: FC3Blocker
TreeView+ depends on / blocked
 
Reported: 2004-09-16 11:59 UTC by Joe Orton
Modified: 2007-11-30 22:10 UTC (History)
7 users (show)

Fixed In Version: 2.6.9-1.639
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-09-04 23:07:20 UTC


Attachments (Terms of Use)
Upstream patch I propsed to fix this problem (which also applies nicely to RHEL4 kernels) (2.29 KB, patch)
2004-10-13 12:17 UTC, Steve Dickson
no flags Details | Diff
Updated patch (2.30 KB, patch)
2004-10-19 12:03 UTC, Steve Dickson
no flags Details | Diff

Description Joe Orton 2004-09-16 11:59:34 UTC
2.6.8-1.541, i686, UP

Badness in interruptible_sleep_on_timeout at kernel/sched.c:3004
 [<022f7ba4>] interruptible_sleep_on_timeout+0x5d/0x23a
 [<0211b5bd>] default_wake_function+0x0/0xc
 [<22a4a9e6>] lockd_down+0xb4/0x258 [lockd]
 [<22a9c40a>] nfs_kill_super+0x43/0x63 [nfs]
 [<02167d56>] deactivate_super+0xcb/0xe0
 [<021848cc>] sys_umount+0x65/0x6c
 [<0217f60b>] destroy_inode+0x36/0x45
 [<0217b0a3>] dput+0x33/0x4f3
 [<021624c6>] __fput+0xc9/0xee
 [<021848de>] sys_oldumount+0xb/0xe

Comment 1 Jason Vas Dias 2004-09-17 17:12:44 UTC
 Here's some additional context for this bug.
 It always happens (100% reproducibility) during system
 shutdown (service nfs stop) on two systems with 
 kernel-2.6.8-1.541 installed - one, an SMP 2-processor
 P4 system, and the other a uniprocessor IBM Thinkpad P6
 laptop (I have no other systems with kernel-2.6.8-1.541).

 Once, the problem caused an 'Oops' - the kernel crashed -
 (ie I saw the Badness messages and then immediately after
 the Oops occurred ). I think I may have moved the mouse 
 when the Oops occurred - I haven't been able to duplicate.
 
 I see these messages in /var/log/messages (I have enabled
 kernel.* /var/log/messages in /etc/syslog.conf):

ntpd: ntpd shutdown succeeded
 kernel: Badness in interruptible_sleep_on_timeout at  kernel/sched.c:3004
 kernel:  [<022f7ba4>] interruptible_sleep_on_timeout+0x5d/0x23a
 kernel:  [<0211b5bd>] default_wake_function+0x0/0xc
 kernel:  [<22b159e6>] lockd_down+0xb4/0x258 [lockd]
 kernel:  [<22cc740a>] nfs_kill_super+0x43/0x63 [nfs]
 kernel:  [<02167d56>] deactivate_super+0xcb/0xe0
 kernel:  [<021848cc>] sys_umount+0x65/0x6c
 kernel:  [<0217b0a3>] dput+0x33/0x4f3
 kernel:  [<021624c6>] __fput+0xc9/0xee
 kernel:  [<02160d2c>] filp_close+0x59/0x5f
 netfs: Unmounting NFS filesystems:  succeeded

Comment 2 Jason Vas Dias 2004-09-21 14:51:24 UTC
This still happens in kernel-2.6.8-1.584 .

Comment 3 Jason Vas Dias 2004-09-21 16:12:19 UTC
I just had another "Oops" on rebooting kernel-2.6.8-1.584 .
I manually copied the call trace from the screen (without the
addresses):
Call Trace:
        disable_IO_APIC
        machine_restart
        sys_reboot
        handle_mm_fault
        do_page_fault
        destroy_inode
        dput
        __fput
        filp_close
Code: Bad EIP value.

Comment 4 Yao Zhang 2004-09-28 01:31:25 UTC
The same happens on my machine too.  I am running the lates rawhide
with 2.6.8-1.584 kernel.  It can always be reproduced by umount a NFS
share.  The NFS server runs RedHat 6.2.

The error message in the NFS client's /var/log/message:

Sep 27 21:22:30 water kernel: Badness in
interruptible_sleep_on_timeout at kernel/sched.c:3004
Sep 27 21:22:30 water kernel:  [<022ff2a8>]
interruptible_sleep_on_timeout+0x5d/0x23a
Sep 27 21:22:30 water kernel:  [<0211b869>] default_wake_function+0x0/0xc
Sep 27 21:22:30 water kernel:  [<12b0aa66>] lockd_down+0xb4/0x258 [lockd]
Sep 27 21:22:30 water kernel:  [<12b61439>] nfs_kill_super+0x43/0x63 [nfs]
Sep 27 21:22:30 water kernel:  [<0216a2aa>] deactivate_super+0xcb/0xe0
Sep 27 21:22:30 water kernel:  [<02186f18>] sys_umount+0x65/0x6c
Sep 27 21:22:30 water kernel:  [<02181c57>] destroy_inode+0x36/0x45
Sep 27 21:22:30 water kernel:  [<02108b37>] do_IRQ+0x2fd/0x309
Sep 27 21:22:30 water kernel:  [<02186f2a>] sys_oldumount+0xb/0xe

Comment 5 Jason Vas Dias 2004-10-07 19:23:24 UTC
 still happening in kernel-2.6.8-1.598

Comment 6 Vladimir Ivanovic 2004-10-10 00:55:23 UTC
kernel-smp-2.6.8-1.590

Oct  5 21:50:31 bach kernel: Badness in interruptible_sleep_on_timeout
at kernel/sched.c:3004
Oct  5 21:50:31 bach kernel:  [<022b4ebc>]
interruptible_sleep_on_timeout+0x5d/0xd0
Oct  5 21:50:31 bach kernel:  [<0211c983>] default_wake_function+0x0/0xc
Oct  5 21:50:31 bach kernel:  [<82b34fdb>] lockd_down+0xb3/0x10c [lockd]
Oct  5 21:50:31 bach kernel:  [<82c05e5c>] nfs_kill_super+0x43/0x63 [nfs]
Oct  5 21:50:31 bach kernel:  [<02157615>] deactivate_super+0x5b/0x70
Oct  5 21:50:31 bach kernel:  [<0216a26b>] sys_umount+0x65/0x6c
Oct  5 21:50:31 bach kernel:  [<02147bfd>] unmap_vma_list+0xe/0x17
Oct  5 21:50:31 bach kernel:  [<02147f64>] do_munmap+0x156/0x164
Oct  5 21:50:31 bach kernel:  [<0216a27d>] sys_oldumount+0xb/0xe
Oct  5 21:50:31 bach amd[3697]: /etc/amd.net unmounted fstype toplvl
from /net

Comment 7 Steve Dickson 2004-10-13 12:14:37 UTC
This is caused by a non-upstream patch that was added to RHEL4
which removed the holding of the BLK lock. When I send the patch 
upstream that removed the warnings,  it was strongly advised 
(in which I have to agree) not to remove the holding BLK lock. 
So I would suggest we remove the "removing of the BLK lock" patch.

Comment 8 Steve Dickson 2004-10-13 12:17:19 UTC
Created attachment 105137 [details]
Upstream patch I propsed to fix this problem (which also applies nicely to RHEL4 kernels)

Comment 9 Christopher Stone 2004-10-16 02:12:19 UTC
*** Bug 135622 has been marked as a duplicate of this bug. ***

Comment 10 G.Wolfe Woodbury 2004-10-16 05:34:37 UTC
This is still occurring in 2.6.8-1.624 kernel from rawhide 2004-10-14

Comment 11 Warren Togami 2004-10-19 02:43:32 UTC
According to davej, steved's patch fails to apply.  Please advise.

Comment 12 Steve Dickson 2004-10-19 12:03:18 UTC
Created attachment 105445 [details]
Updated patch

Comment 13 Bill Nottingham 2004-10-20 04:54:30 UTC
Added in 2.6.9-1.639.

Comment 14 Jason Vas Dias 2004-10-20 14:10:12 UTC
This problem appears to be fixed in 2.6.9-1.637 . 

Comment 15 Jason Baron 2004-10-21 17:31:10 UTC
*** Bug 136639 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.