Bug 166152 - unmounting nfs fs causes badness in interruptible_sleep_on_timeout
unmounting nfs fs causes badness in interruptible_sleep_on_timeout
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
ia64 Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
Brian Brock
:
: 173144 173730 (view as bug list)
Depends On:
Blocks: fedora-ia64
  Show dependency treegraph
 
Reported: 2005-08-17 11:37 EDT by Erik Jacobson
Modified: 2007-11-30 17:11 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-11-23 01:07:10 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Bugzilla 132726 None None None Never

  None (edit)
Description Erik Jacobson 2005-08-17 11:37:16 EDT
I set this to ia64 for the moment; I haven't tried other architectures.

Using the ia64 development tree (mirror from yesterday) with these packages:
[root@minime1 ~]# rpm -q nfs-utils kernel
nfs-utils-1.0.7-12
kernel-2.6.12-1.1485_FC5

This seems 100% repeatable... Mount an nfs share, then unmount it:

[root@minime1 ~]# umount /mnt
Badness in interruptible_sleep_on_timeout at kernel/sched.c:3297 (Not tainted)

Call Trace:
 [<a000000100012a60>] show_stack+0x80/0xa0
                                sp=e0000030700d7bb0 bsp=e0000030700d0fa8
 [<a0000001000133f0>] dump_stack+0x30/0x60
                                sp=e0000030700d7d80 bsp=e0000030700d0f90
kernel unaligned access to 0xe000003016a0fce4, ip=0xa00000010056cbd1
kernel unaligned access to 0xe000003016a0fce4, ip=0xa00000010056cc90
kernel unaligned access to 0xe000003016a0fce4, ip=0xa00000010056cbd1
kernel unaligned access to 0xe000003016a0fce4, ip=0xa00000010056cc90
 [<a000000100619220>] interruptible_sleep_on_timeout+0x260/0x300
                                sp=e0000030700d7d80 bsp=e0000030700d0f50
 [<a000000200eae840>] lockd_down+0x220/0x460 [lockd]
                                sp=e0000030700d7db0 bsp=e0000030700d0f28
 [<a000000200fae640>] nfs_kill_super+0x1a0/0x1c0 [nfs]
                                sp=e0000030700d7db0 bsp=e0000030700d0f00
 [<a000000100151550>] deactivate_super+0x150/0x1a0
                                sp=e0000030700d7db0 bsp=e0000030700d0ed0
 [<a00000010018a4f0>] __mntput+0x50/0x80
                                sp=e0000030700d7db0 bsp=e0000030700d0ea8
 [<a000000100165580>] path_release_on_umount+0x60/0x80
                                sp=e0000030700d7db0 bsp=e0000030700d0e88
 [<a00000010018c870>] sys_umount+0x510/0xa40
                                sp=e0000030700d7db0 bsp=e0000030700d0e08
 [<a00000010000b890>] ia64_trace_syscall+0xd0/0x110
                                sp=e0000030700d7e30 bsp=e0000030700d0e08
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e0000030700d8000 bsp=e0000030700d0e08
Comment 2 Michael Young 2005-09-30 09:54:27 EDT
This is reproducable on the 2.6.13-1.1526_FC4smp kernel in FC4 on an i686
Sep 30 14:52:55 itspc-1-28 kernel: Badness in interruptible_sleep_on_timeout at
kernel/sched.c:3297 (Not tainted)
Sep 30 14:52:55 itspc-1-28 kernel:  [<c031732f>]
interruptible_sleep_on_timeout+0xf7/0x113
Sep 30 14:52:55 itspc-1-28 kernel:  [<c012b9e1>] group_send_sig_info+0x59/0x63
Sep 30 14:52:55 itspc-1-28 kernel:  [<c011d046>] default_wake_function+0x0/0xc
Sep 30 14:52:55 itspc-1-28 kernel:  [<dfbcd48b>] lockd_down+0xbe/0x120 [lockd]
Sep 30 14:52:55 itspc-1-28 kernel:  [<dfc20cbe>] nfs_kill_super+0x5e/0x62 [nfs]
Sep 30 14:52:55 itspc-1-28 kernel:  [<c0169fd8>] deactivate_super+0x5d/0x6e
Sep 30 14:52:55 itspc-1-28 kernel:  [<c017ef44>] sys_umount+0x33/0x73
Sep 30 14:52:55 itspc-1-28 kernel:  [<c017bf6b>] destroy_inode+0x3f/0x4e
Sep 30 14:52:55 itspc-1-28 kernel:  [<c0108055>] do_syscall_trace+0xef/0x123
Sep 30 14:52:55 itspc-1-28 kernel:  [<c017ef9b>] sys_oldumount+0x17/0x1b
Sep 30 14:52:55 itspc-1-28 kernel:  [<c010395d>] syscall_call+0x7/0xb
Comment 3 Michael Young 2005-09-30 10:21:09 EDT
Perhaps I should have mentioned this was with nfs-utils-1.0.7-11.
Comment 4 Dominik Mierzejewski 2005-10-05 16:58:54 EDT
Same thing happens on FC4 with
# rpm -q kernel-smp nfs-utils
kernel-smp-2.6.13-1.1526_FC4
nfs-utils-1.0.7-11
It does not happen with kernel-smp-2.6.12-1.1456_FC4, same nfs-utils.

Should I file this under FC4 kernel?

Oct  5 21:09:11 lab-s1 kernel: Badness in interruptible_sleep_on_timeout at
kernel/sched.c:3297 (Not tainted)
Oct  5 21:09:11 lab-s1 kernel:  [<c031732f>]
interruptible_sleep_on_timeout+0xf7/0x113
Oct  5 21:09:11 lab-s1 kernel:  [<c012b9e1>] group_send_sig_info+0x59/0x63
Oct  5 21:09:11 lab-s1 kernel:  [<c011d046>] default_wake_function+0x0/0xc
Oct  5 21:09:11 lab-s1 kernel:  [<f89cc48b>] lockd_down+0xbe/0x120 [lockd]
Oct  5 21:09:11 lab-s1 kernel:  [<f8c7fcbe>] nfs_kill_super+0x5e/0x62 [nfs]
Oct  5 21:09:11 lab-s1 kernel:  [<c0169fd8>] deactivate_super+0x5d/0x6e
Oct  5 21:09:11 lab-s1 kernel:  [<c017ef44>] sys_umount+0x33/0x73
Oct  5 21:09:11 lab-s1 kernel:  [<c017bf6b>] destroy_inode+0x3f/0x4e
Oct  5 21:09:11 lab-s1 kernel:  [<c0108055>] do_syscall_trace+0xef/0x123
Oct  5 21:09:11 lab-s1 kernel:  [<c017ef9b>] sys_oldumount+0x17/0x1b
Oct  5 21:09:11 lab-s1 kernel:  [<c010395d>] syscall_call+0x7/0xb
Comment 5 Jesse Brandeburg 2005-10-05 20:32:40 EDT
also happens on FC4 x86_64 with
kernel-smp-2.6.13-1.1526_FC4
nfs-utils-1.0.7-11

Oct  5 17:05:07 lxr2 kernel: Badness in interruptible_sleep_on_timeout at
kernel/sched.c:3297 (Not tainted)
Oct  5 17:05:07 lxr2 kernel:
Oct  5 17:05:07 lxr2 kernel: Call Trace:
<ffffffff8033c2c8>{interruptible_sleep_on_timeout+131}
<ffffffff80131654>{default_wake_function+0}
<ffffffff88245367>{:lockd:lockd_down+207}
<ffffffff8825b857>{:nfs:nfs_kill_super+78} <ffffffff80186531>{deactivate_super+95}
<ffffffff8019ca23>{sys_umount+739} 
<ffffffff801107ea>{syscall_trace_enter+217}
<ffffffff80110827>{syscall_trace_leave+55} 
<ffffffff8010daa2>{tracesys+113}
<ffffffff8010db02>{tracesys+209}
Comment 6 Dominik Mierzejewski 2005-10-06 08:02:59 EDT
I see that nfs-utils-1.0.7-12 have just been released. No change though.

# rpm -q kernel-smp-2.6.13 nfs-utils
kernel-smp-2.6.13-1.1526_FC4
nfs-utils-1.0.7-12.FC4

Oct  6 14:02:59 lab-s1 kernel: Badness in interruptible_sleep_on_timeout at
kernel/sched.c:3297 (Not tainted)
Oct  6 14:02:59 lab-s1 kernel:  [<c031732f>]
interruptible_sleep_on_timeout+0xf7/0x113
Oct  6 14:02:59 lab-s1 kernel:  [<c012b9e1>] group_send_sig_info+0x59/0x63
Oct  6 14:02:59 lab-s1 kernel:  [<c011d046>] default_wake_function+0x0/0xc
Oct  6 14:02:59 lab-s1 kernel:  [<f89cc48b>] lockd_down+0xbe/0x120 [lockd]
Oct  6 14:02:59 lab-s1 kernel:  [<f8c7fcbe>] nfs_kill_super+0x5e/0x62 [nfs]
Oct  6 14:02:59 lab-s1 kernel:  [<c0169fd8>] deactivate_super+0x5d/0x6e
Oct  6 14:02:59 lab-s1 kernel:  [<c017ef44>] sys_umount+0x33/0x73
Oct  6 14:02:59 lab-s1 kernel:  [<c017bf6b>] destroy_inode+0x3f/0x4e
Oct  6 14:02:59 lab-s1 kernel:  [<c0108055>] do_syscall_trace+0xef/0x123
Oct  6 14:02:59 lab-s1 kernel:  [<c017ef9b>] sys_oldumount+0x17/0x1b
Oct  6 14:02:59 lab-s1 kernel:  [<c010395d>] syscall_call+0x7/0xb

Comment 7 Jon Burgess 2005-10-13 19:13:13 EDT
me too on 2.6.13-1.1601_FC5/x86_64

Looks to me like a re-occurance of the old bug 132726 which was addressed by the
patch:
  linux-2.6.8-lockd-racewarn2.patch

and this was dropped in kernel-2_6_12-1_1396
Comment 8 Tethys 2005-10-20 16:12:45 EDT
Me too on FC4, this time on IA32, so it's not just a 64 bit issue.

kernel-smp-2.6.13-1.1526_FC4
nfs-utils-1.0.7-12.FC4

Oct 20 21:03:41 leto kernel: Badness in interruptible_sleep_on_timeout at
kernel/sched.c:3297 (Not tainted)
Oct 20 21:03:41 leto kernel:  [<c031732f>] interruptible_sleep_on_timeout+0xf7/0x113
Oct 20 21:03:41 leto kernel:  [<c012b9e1>] group_send_sig_info+0x59/0x63
Oct 20 21:03:41 leto kernel:  [<c011d046>] default_wake_function+0x0/0xc
Oct 20 21:03:41 leto kernel:  [<f8b8448b>] lockd_down+0xbe/0x120 [lockd]
Oct 20 21:03:41 leto kernel:  [<f8bffcbe>] nfs_kill_super+0x5e/0x62 [nfs]
Oct 20 21:03:41 leto kernel:  [<c0169fd8>] deactivate_super+0x5d/0x6e
Oct 20 21:03:41 leto kernel:  [<c017ef44>] sys_umount+0x33/0x73
Oct 20 21:03:41 leto kernel:  [<c017a1a0>] dput+0x126/0x258
Oct 20 21:03:41 leto kernel:  [<c0165066>] __fput+0x139/0x18d
Oct 20 21:03:41 leto kernel:  [<c01638a6>] filp_close+0x3e/0x62
Oct 20 21:03:41 leto kernel:  [<c010395d>] syscall_call+0x7/0xb
Comment 9 Dominik Mierzejewski 2005-11-08 16:48:10 EST
I can confirm it's still there in kernel-smp-2.6.13-1.1532_FC4, tested on dual
Athlon MP.
Comment 10 Steve Dickson 2005-11-17 16:46:31 EST
*** Bug 173144 has been marked as a duplicate of this bug. ***
Comment 11 Riku Meskanen 2005-11-19 11:18:46 EST
Howdy,

Completely repeatable here too with up to date patched FC4 on DELL PE-2550 (ia32) dual processor
configuration. Doesn't however occur with previous 2.6.13-1.1532_FC4 smp-kernel or with similar
single processor PE-2550 configuration running 2.6.14-1.1637_FC4.

HTH,

:-) riku
 
# uname -a
Linux rudy.cc.jyu.fi 2.6.14-1.1637_FC4smp #1 SMP Wed Nov 9 18:34:11 EST 2005 i686 i686 i386 
GNU/Linux

[root@rudy src]# rpm -q nfs-utils
nfs-utils-1.0.7-12.FC4

Badness in interruptible_sleep_on_timeout at kernel/sched.c:3403 (Not tainted)
 [<c031d2b2>] interruptible_sleep_on_timeout+0xf7/0x114
 [<c012b213>] group_send_sig_info+0x59/0x63
 [<c011c4de>] default_wake_function+0x0/0xc
 [<f8c1b4fb>] lockd_down+0xbe/0x120 [lockd]
 [<f8ccdcf7>] nfs_kill_super+0x5c/0x5e [nfs]
 [<c01699ed>] deactivate_super+0x60/0x71
 [<c017eb4e>] sys_umount+0x33/0x73
 [<c0107dc6>] do_syscall_trace+0x1e4/0x1f6
 [<c017eba5>] sys_oldumount+0x17/0x1b
 [<c01039e1>] syscall_call+0x7/0xb
-- 
Comment 12 Dave Jones 2005-11-20 15:32:44 EST
*** Bug 173730 has been marked as a duplicate of this bug. ***
Comment 13 Steve Dickson 2005-11-22 10:29:55 EST
Here is that patch that will take care of this
badness warning...

--- fs/lockd/svc.c.orig 2005-10-27 20:02:08.000000000 -0400
+++ fs/lockd/svc.c      2005-11-17 16:31:48.111289000 -0500
@@ -305,7 +305,7 @@ lockd_down(void)
         * the lockd semaphore, we can't wait around forever ...
         */
        clear_thread_flag(TIF_SIGPENDING);
-       interruptible_sleep_on_timeout(&lockd_exit, HZ);
+       wait_event_timeout(lockd_exit, nlmsvc_pid == 0, HZ);
        if (nlmsvc_pid) {
                printk(KERN_WARNING
                        "lockd_down: lockd failed to exit, clearing pid\n");
Comment 14 Dave Jones 2005-11-23 01:07:10 EST
merged in cvs. built for rawhide, should go out in the first post-test1 push 

(also available from http://people.redhat.com/davej/kernels/Fedora/devel/
Comment 15 Dominik Mierzejewski 2005-12-15 21:06:12 EST
Will there be an errata for fc4 fixing this?
Comment 16 Dave Jones 2005-12-15 22:57:33 EST
is already fixed in the current fc4 errata released earlier this week.

Note You need to log in before you can comment on or make changes to this bug.