Bug 132726

Summary:

NFS/lockd: Badness in interruptible_sleep_on_timeout

Product:

[Fedora] Fedora

Reporter:

Joe Orton <jorton>

Component:

kernel

Assignee:

Steve Dickson <steved>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Severity:

high

Docs Contact:

Priority:

high

Version:

rawhide

CC:

andrew.grover, davej, jvdias, rdieter, redwolfe, tkmame, wtogami

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

2.6.9-1.639

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2005-09-04 23:07:20 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

130887

Attachments:

Description	Flags
Upstream patch I propsed to fix this problem (which also applies nicely to RHEL4 kernels)	none
Updated patch	none

Description Joe Orton 2004-09-16 11:59:34 UTC

2.6.8-1.541, i686, UP

Badness in interruptible_sleep_on_timeout at kernel/sched.c:3004
 [<022f7ba4>] interruptible_sleep_on_timeout+0x5d/0x23a
 [<0211b5bd>] default_wake_function+0x0/0xc
 [<22a4a9e6>] lockd_down+0xb4/0x258 [lockd]
 [<22a9c40a>] nfs_kill_super+0x43/0x63 [nfs]
 [<02167d56>] deactivate_super+0xcb/0xe0
 [<021848cc>] sys_umount+0x65/0x6c
 [<0217f60b>] destroy_inode+0x36/0x45
 [<0217b0a3>] dput+0x33/0x4f3
 [<021624c6>] __fput+0xc9/0xee
 [<021848de>] sys_oldumount+0xb/0xe

Comment 1 Jason Vas Dias 2004-09-17 17:12:44 UTC

 Here's some additional context for this bug.
 It always happens (100% reproducibility) during system
 shutdown (service nfs stop) on two systems with 
 kernel-2.6.8-1.541 installed - one, an SMP 2-processor
 P4 system, and the other a uniprocessor IBM Thinkpad P6
 laptop (I have no other systems with kernel-2.6.8-1.541).

 Once, the problem caused an 'Oops' - the kernel crashed -
 (ie I saw the Badness messages and then immediately after
 the Oops occurred ). I think I may have moved the mouse 
 when the Oops occurred - I haven't been able to duplicate.
 
 I see these messages in /var/log/messages (I have enabled
 kernel.* /var/log/messages in /etc/syslog.conf):

ntpd: ntpd shutdown succeeded
 kernel: Badness in interruptible_sleep_on_timeout at  kernel/sched.c:3004
 kernel:  [<022f7ba4>] interruptible_sleep_on_timeout+0x5d/0x23a
 kernel:  [<0211b5bd>] default_wake_function+0x0/0xc
 kernel:  [<22b159e6>] lockd_down+0xb4/0x258 [lockd]
 kernel:  [<22cc740a>] nfs_kill_super+0x43/0x63 [nfs]
 kernel:  [<02167d56>] deactivate_super+0xcb/0xe0
 kernel:  [<021848cc>] sys_umount+0x65/0x6c
 kernel:  [<0217b0a3>] dput+0x33/0x4f3
 kernel:  [<021624c6>] __fput+0xc9/0xee
 kernel:  [<02160d2c>] filp_close+0x59/0x5f
 netfs: Unmounting NFS filesystems:  succeeded

Comment 2 Jason Vas Dias 2004-09-21 14:51:24 UTC

This still happens in kernel-2.6.8-1.584 .

Comment 3 Jason Vas Dias 2004-09-21 16:12:19 UTC

I just had another "Oops" on rebooting kernel-2.6.8-1.584 .
I manually copied the call trace from the screen (without the
addresses):
Call Trace:
        disable_IO_APIC
        machine_restart
        sys_reboot
        handle_mm_fault
        do_page_fault
        destroy_inode
        dput
        __fput
        filp_close
Code: Bad EIP value.

Comment 4 Yao Zhang 2004-09-28 01:31:25 UTC

The same happens on my machine too.  I am running the lates rawhide
with 2.6.8-1.584 kernel.  It can always be reproduced by umount a NFS
share.  The NFS server runs RedHat 6.2.

The error message in the NFS client's /var/log/message:

Sep 27 21:22:30 water kernel: Badness in
interruptible_sleep_on_timeout at kernel/sched.c:3004
Sep 27 21:22:30 water kernel:  [<022ff2a8>]
interruptible_sleep_on_timeout+0x5d/0x23a
Sep 27 21:22:30 water kernel:  [<0211b869>] default_wake_function+0x0/0xc
Sep 27 21:22:30 water kernel:  [<12b0aa66>] lockd_down+0xb4/0x258 [lockd]
Sep 27 21:22:30 water kernel:  [<12b61439>] nfs_kill_super+0x43/0x63 [nfs]
Sep 27 21:22:30 water kernel:  [<0216a2aa>] deactivate_super+0xcb/0xe0
Sep 27 21:22:30 water kernel:  [<02186f18>] sys_umount+0x65/0x6c
Sep 27 21:22:30 water kernel:  [<02181c57>] destroy_inode+0x36/0x45
Sep 27 21:22:30 water kernel:  [<02108b37>] do_IRQ+0x2fd/0x309
Sep 27 21:22:30 water kernel:  [<02186f2a>] sys_oldumount+0xb/0xe

Comment 5 Jason Vas Dias 2004-10-07 19:23:24 UTC

 still happening in kernel-2.6.8-1.598

Comment 6 Vladimir Ivanovic 2004-10-10 00:55:23 UTC

kernel-smp-2.6.8-1.590

Oct  5 21:50:31 bach kernel: Badness in interruptible_sleep_on_timeout
at kernel/sched.c:3004
Oct  5 21:50:31 bach kernel:  [<022b4ebc>]
interruptible_sleep_on_timeout+0x5d/0xd0
Oct  5 21:50:31 bach kernel:  [<0211c983>] default_wake_function+0x0/0xc
Oct  5 21:50:31 bach kernel:  [<82b34fdb>] lockd_down+0xb3/0x10c [lockd]
Oct  5 21:50:31 bach kernel:  [<82c05e5c>] nfs_kill_super+0x43/0x63 [nfs]
Oct  5 21:50:31 bach kernel:  [<02157615>] deactivate_super+0x5b/0x70
Oct  5 21:50:31 bach kernel:  [<0216a26b>] sys_umount+0x65/0x6c
Oct  5 21:50:31 bach kernel:  [<02147bfd>] unmap_vma_list+0xe/0x17
Oct  5 21:50:31 bach kernel:  [<02147f64>] do_munmap+0x156/0x164
Oct  5 21:50:31 bach kernel:  [<0216a27d>] sys_oldumount+0xb/0xe
Oct  5 21:50:31 bach amd[3697]: /etc/amd.net unmounted fstype toplvl
from /net

Comment 7 Steve Dickson 2004-10-13 12:14:37 UTC

This is caused by a non-upstream patch that was added to RHEL4
which removed the holding of the BLK lock. When I send the patch 
upstream that removed the warnings,  it was strongly advised 
(in which I have to agree) not to remove the holding BLK lock. 
So I would suggest we remove the "removing of the BLK lock" patch.

Comment 8 Steve Dickson 2004-10-13 12:17:19 UTC

Created attachment 105137 [details]
Upstream patch I propsed to fix this problem (which also applies nicely to RHEL4 kernels)

Comment 9 Christopher Stone 2004-10-16 02:12:19 UTC

*** Bug 135622 has been marked as a duplicate of this bug. ***

Comment 10 G.Wolfe Woodbury 2004-10-16 05:34:37 UTC

This is still occurring in 2.6.8-1.624 kernel from rawhide 2004-10-14

Comment 11 Warren Togami 2004-10-19 02:43:32 UTC

According to davej, steved's patch fails to apply.  Please advise.

Comment 12 Steve Dickson 2004-10-19 12:03:18 UTC

Created attachment 105445 [details]
Updated patch

Comment 13 Bill Nottingham 2004-10-20 04:54:30 UTC

Added in 2.6.9-1.639.

Comment 14 Jason Vas Dias 2004-10-20 14:10:12 UTC

This problem appears to be fixed in 2.6.9-1.637 .

Comment 15 Jason Baron 2004-10-21 17:31:10 UTC

*** Bug 136639 has been marked as a duplicate of this bug. ***