Bug 132726
| Summary: | NFS/lockd: Badness in interruptible_sleep_on_timeout | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Joe Orton <jorton> | ||||||
| Component: | kernel | Assignee: | Steve Dickson <steved> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | rawhide | CC: | andrew.grover, davej, jvdias, rdieter, redwolfe, tkmame, wtogami | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | All | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | 2.6.9-1.639 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2005-09-04 23:07:20 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 130887 | ||||||||
| Attachments: |
|
||||||||
|
Description
Joe Orton
2004-09-16 11:59:34 UTC
Here's some additional context for this bug. It always happens (100% reproducibility) during system shutdown (service nfs stop) on two systems with kernel-2.6.8-1.541 installed - one, an SMP 2-processor P4 system, and the other a uniprocessor IBM Thinkpad P6 laptop (I have no other systems with kernel-2.6.8-1.541). Once, the problem caused an 'Oops' - the kernel crashed - (ie I saw the Badness messages and then immediately after the Oops occurred ). I think I may have moved the mouse when the Oops occurred - I haven't been able to duplicate. I see these messages in /var/log/messages (I have enabled kernel.* /var/log/messages in /etc/syslog.conf): ntpd: ntpd shutdown succeeded kernel: Badness in interruptible_sleep_on_timeout at kernel/sched.c:3004 kernel: [<022f7ba4>] interruptible_sleep_on_timeout+0x5d/0x23a kernel: [<0211b5bd>] default_wake_function+0x0/0xc kernel: [<22b159e6>] lockd_down+0xb4/0x258 [lockd] kernel: [<22cc740a>] nfs_kill_super+0x43/0x63 [nfs] kernel: [<02167d56>] deactivate_super+0xcb/0xe0 kernel: [<021848cc>] sys_umount+0x65/0x6c kernel: [<0217b0a3>] dput+0x33/0x4f3 kernel: [<021624c6>] __fput+0xc9/0xee kernel: [<02160d2c>] filp_close+0x59/0x5f netfs: Unmounting NFS filesystems: succeeded This still happens in kernel-2.6.8-1.584 . I just had another "Oops" on rebooting kernel-2.6.8-1.584 .
I manually copied the call trace from the screen (without the
addresses):
Call Trace:
disable_IO_APIC
machine_restart
sys_reboot
handle_mm_fault
do_page_fault
destroy_inode
dput
__fput
filp_close
Code: Bad EIP value.
The same happens on my machine too. I am running the lates rawhide with 2.6.8-1.584 kernel. It can always be reproduced by umount a NFS share. The NFS server runs RedHat 6.2. The error message in the NFS client's /var/log/message: Sep 27 21:22:30 water kernel: Badness in interruptible_sleep_on_timeout at kernel/sched.c:3004 Sep 27 21:22:30 water kernel: [<022ff2a8>] interruptible_sleep_on_timeout+0x5d/0x23a Sep 27 21:22:30 water kernel: [<0211b869>] default_wake_function+0x0/0xc Sep 27 21:22:30 water kernel: [<12b0aa66>] lockd_down+0xb4/0x258 [lockd] Sep 27 21:22:30 water kernel: [<12b61439>] nfs_kill_super+0x43/0x63 [nfs] Sep 27 21:22:30 water kernel: [<0216a2aa>] deactivate_super+0xcb/0xe0 Sep 27 21:22:30 water kernel: [<02186f18>] sys_umount+0x65/0x6c Sep 27 21:22:30 water kernel: [<02181c57>] destroy_inode+0x36/0x45 Sep 27 21:22:30 water kernel: [<02108b37>] do_IRQ+0x2fd/0x309 Sep 27 21:22:30 water kernel: [<02186f2a>] sys_oldumount+0xb/0xe still happening in kernel-2.6.8-1.598 kernel-smp-2.6.8-1.590 Oct 5 21:50:31 bach kernel: Badness in interruptible_sleep_on_timeout at kernel/sched.c:3004 Oct 5 21:50:31 bach kernel: [<022b4ebc>] interruptible_sleep_on_timeout+0x5d/0xd0 Oct 5 21:50:31 bach kernel: [<0211c983>] default_wake_function+0x0/0xc Oct 5 21:50:31 bach kernel: [<82b34fdb>] lockd_down+0xb3/0x10c [lockd] Oct 5 21:50:31 bach kernel: [<82c05e5c>] nfs_kill_super+0x43/0x63 [nfs] Oct 5 21:50:31 bach kernel: [<02157615>] deactivate_super+0x5b/0x70 Oct 5 21:50:31 bach kernel: [<0216a26b>] sys_umount+0x65/0x6c Oct 5 21:50:31 bach kernel: [<02147bfd>] unmap_vma_list+0xe/0x17 Oct 5 21:50:31 bach kernel: [<02147f64>] do_munmap+0x156/0x164 Oct 5 21:50:31 bach kernel: [<0216a27d>] sys_oldumount+0xb/0xe Oct 5 21:50:31 bach amd[3697]: /etc/amd.net unmounted fstype toplvl from /net This is caused by a non-upstream patch that was added to RHEL4 which removed the holding of the BLK lock. When I send the patch upstream that removed the warnings, it was strongly advised (in which I have to agree) not to remove the holding BLK lock. So I would suggest we remove the "removing of the BLK lock" patch. Created attachment 105137 [details]
Upstream patch I propsed to fix this problem (which also applies nicely to RHEL4 kernels)
*** Bug 135622 has been marked as a duplicate of this bug. *** This is still occurring in 2.6.8-1.624 kernel from rawhide 2004-10-14 According to davej, steved's patch fails to apply. Please advise. Created attachment 105445 [details]
Updated patch
Added in 2.6.9-1.639. This problem appears to be fixed in 2.6.9-1.637 . *** Bug 136639 has been marked as a duplicate of this bug. *** |