Bug 166152 - unmounting nfs fs causes badness in interruptible_sleep_on_timeout
Summary: unmounting nfs fs causes badness in interruptible_sleep_on_timeout
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: ia64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Brian Brock
URL:
Whiteboard:
: 173144 173730 (view as bug list)
Depends On:
Blocks: fedora-ia64
TreeView+ depends on / blocked
 
Reported: 2005-08-17 15:37 UTC by erikj
Modified: 2019-01-02 11:50 UTC (History)
11 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2005-11-23 06:07:10 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 132726 0 high CLOSED NFS/lockd: Badness in interruptible_sleep_on_timeout 2021-02-22 00:41:40 UTC

Description erikj 2005-08-17 15:37:16 UTC
I set this to ia64 for the moment; I haven't tried other architectures.

Using the ia64 development tree (mirror from yesterday) with these packages:
[root@minime1 ~]# rpm -q nfs-utils kernel
nfs-utils-1.0.7-12
kernel-2.6.12-1.1485_FC5

This seems 100% repeatable... Mount an nfs share, then unmount it:

[root@minime1 ~]# umount /mnt
Badness in interruptible_sleep_on_timeout at kernel/sched.c:3297 (Not tainted)

Call Trace:
 [<a000000100012a60>] show_stack+0x80/0xa0
                                sp=e0000030700d7bb0 bsp=e0000030700d0fa8
 [<a0000001000133f0>] dump_stack+0x30/0x60
                                sp=e0000030700d7d80 bsp=e0000030700d0f90
kernel unaligned access to 0xe000003016a0fce4, ip=0xa00000010056cbd1
kernel unaligned access to 0xe000003016a0fce4, ip=0xa00000010056cc90
kernel unaligned access to 0xe000003016a0fce4, ip=0xa00000010056cbd1
kernel unaligned access to 0xe000003016a0fce4, ip=0xa00000010056cc90
 [<a000000100619220>] interruptible_sleep_on_timeout+0x260/0x300
                                sp=e0000030700d7d80 bsp=e0000030700d0f50
 [<a000000200eae840>] lockd_down+0x220/0x460 [lockd]
                                sp=e0000030700d7db0 bsp=e0000030700d0f28
 [<a000000200fae640>] nfs_kill_super+0x1a0/0x1c0 [nfs]
                                sp=e0000030700d7db0 bsp=e0000030700d0f00
 [<a000000100151550>] deactivate_super+0x150/0x1a0
                                sp=e0000030700d7db0 bsp=e0000030700d0ed0
 [<a00000010018a4f0>] __mntput+0x50/0x80
                                sp=e0000030700d7db0 bsp=e0000030700d0ea8
 [<a000000100165580>] path_release_on_umount+0x60/0x80
                                sp=e0000030700d7db0 bsp=e0000030700d0e88
 [<a00000010018c870>] sys_umount+0x510/0xa40
                                sp=e0000030700d7db0 bsp=e0000030700d0e08
 [<a00000010000b890>] ia64_trace_syscall+0xd0/0x110
                                sp=e0000030700d7e30 bsp=e0000030700d0e08
 [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400
                                sp=e0000030700d8000 bsp=e0000030700d0e08

Comment 2 Michael Young 2005-09-30 13:54:27 UTC
This is reproducable on the 2.6.13-1.1526_FC4smp kernel in FC4 on an i686
Sep 30 14:52:55 itspc-1-28 kernel: Badness in interruptible_sleep_on_timeout at
kernel/sched.c:3297 (Not tainted)
Sep 30 14:52:55 itspc-1-28 kernel:  [<c031732f>]
interruptible_sleep_on_timeout+0xf7/0x113
Sep 30 14:52:55 itspc-1-28 kernel:  [<c012b9e1>] group_send_sig_info+0x59/0x63
Sep 30 14:52:55 itspc-1-28 kernel:  [<c011d046>] default_wake_function+0x0/0xc
Sep 30 14:52:55 itspc-1-28 kernel:  [<dfbcd48b>] lockd_down+0xbe/0x120 [lockd]
Sep 30 14:52:55 itspc-1-28 kernel:  [<dfc20cbe>] nfs_kill_super+0x5e/0x62 [nfs]
Sep 30 14:52:55 itspc-1-28 kernel:  [<c0169fd8>] deactivate_super+0x5d/0x6e
Sep 30 14:52:55 itspc-1-28 kernel:  [<c017ef44>] sys_umount+0x33/0x73
Sep 30 14:52:55 itspc-1-28 kernel:  [<c017bf6b>] destroy_inode+0x3f/0x4e
Sep 30 14:52:55 itspc-1-28 kernel:  [<c0108055>] do_syscall_trace+0xef/0x123
Sep 30 14:52:55 itspc-1-28 kernel:  [<c017ef9b>] sys_oldumount+0x17/0x1b
Sep 30 14:52:55 itspc-1-28 kernel:  [<c010395d>] syscall_call+0x7/0xb

Comment 3 Michael Young 2005-09-30 14:21:09 UTC
Perhaps I should have mentioned this was with nfs-utils-1.0.7-11.

Comment 4 Dominik Mierzejewski 2005-10-05 20:58:54 UTC
Same thing happens on FC4 with
# rpm -q kernel-smp nfs-utils
kernel-smp-2.6.13-1.1526_FC4
nfs-utils-1.0.7-11
It does not happen with kernel-smp-2.6.12-1.1456_FC4, same nfs-utils.

Should I file this under FC4 kernel?

Oct  5 21:09:11 lab-s1 kernel: Badness in interruptible_sleep_on_timeout at
kernel/sched.c:3297 (Not tainted)
Oct  5 21:09:11 lab-s1 kernel:  [<c031732f>]
interruptible_sleep_on_timeout+0xf7/0x113
Oct  5 21:09:11 lab-s1 kernel:  [<c012b9e1>] group_send_sig_info+0x59/0x63
Oct  5 21:09:11 lab-s1 kernel:  [<c011d046>] default_wake_function+0x0/0xc
Oct  5 21:09:11 lab-s1 kernel:  [<f89cc48b>] lockd_down+0xbe/0x120 [lockd]
Oct  5 21:09:11 lab-s1 kernel:  [<f8c7fcbe>] nfs_kill_super+0x5e/0x62 [nfs]
Oct  5 21:09:11 lab-s1 kernel:  [<c0169fd8>] deactivate_super+0x5d/0x6e
Oct  5 21:09:11 lab-s1 kernel:  [<c017ef44>] sys_umount+0x33/0x73
Oct  5 21:09:11 lab-s1 kernel:  [<c017bf6b>] destroy_inode+0x3f/0x4e
Oct  5 21:09:11 lab-s1 kernel:  [<c0108055>] do_syscall_trace+0xef/0x123
Oct  5 21:09:11 lab-s1 kernel:  [<c017ef9b>] sys_oldumount+0x17/0x1b
Oct  5 21:09:11 lab-s1 kernel:  [<c010395d>] syscall_call+0x7/0xb


Comment 5 Jesse Brandeburg 2005-10-06 00:32:40 UTC
also happens on FC4 x86_64 with
kernel-smp-2.6.13-1.1526_FC4
nfs-utils-1.0.7-11

Oct  5 17:05:07 lxr2 kernel: Badness in interruptible_sleep_on_timeout at
kernel/sched.c:3297 (Not tainted)
Oct  5 17:05:07 lxr2 kernel:
Oct  5 17:05:07 lxr2 kernel: Call Trace:
<ffffffff8033c2c8>{interruptible_sleep_on_timeout+131}
<ffffffff80131654>{default_wake_function+0}
<ffffffff88245367>{:lockd:lockd_down+207}
<ffffffff8825b857>{:nfs:nfs_kill_super+78} <ffffffff80186531>{deactivate_super+95}
<ffffffff8019ca23>{sys_umount+739} 
<ffffffff801107ea>{syscall_trace_enter+217}
<ffffffff80110827>{syscall_trace_leave+55} 
<ffffffff8010daa2>{tracesys+113}
<ffffffff8010db02>{tracesys+209}

Comment 6 Dominik Mierzejewski 2005-10-06 12:02:59 UTC
I see that nfs-utils-1.0.7-12 have just been released. No change though.

# rpm -q kernel-smp-2.6.13 nfs-utils
kernel-smp-2.6.13-1.1526_FC4
nfs-utils-1.0.7-12.FC4

Oct  6 14:02:59 lab-s1 kernel: Badness in interruptible_sleep_on_timeout at
kernel/sched.c:3297 (Not tainted)
Oct  6 14:02:59 lab-s1 kernel:  [<c031732f>]
interruptible_sleep_on_timeout+0xf7/0x113
Oct  6 14:02:59 lab-s1 kernel:  [<c012b9e1>] group_send_sig_info+0x59/0x63
Oct  6 14:02:59 lab-s1 kernel:  [<c011d046>] default_wake_function+0x0/0xc
Oct  6 14:02:59 lab-s1 kernel:  [<f89cc48b>] lockd_down+0xbe/0x120 [lockd]
Oct  6 14:02:59 lab-s1 kernel:  [<f8c7fcbe>] nfs_kill_super+0x5e/0x62 [nfs]
Oct  6 14:02:59 lab-s1 kernel:  [<c0169fd8>] deactivate_super+0x5d/0x6e
Oct  6 14:02:59 lab-s1 kernel:  [<c017ef44>] sys_umount+0x33/0x73
Oct  6 14:02:59 lab-s1 kernel:  [<c017bf6b>] destroy_inode+0x3f/0x4e
Oct  6 14:02:59 lab-s1 kernel:  [<c0108055>] do_syscall_trace+0xef/0x123
Oct  6 14:02:59 lab-s1 kernel:  [<c017ef9b>] sys_oldumount+0x17/0x1b
Oct  6 14:02:59 lab-s1 kernel:  [<c010395d>] syscall_call+0x7/0xb



Comment 7 Jon Burgess 2005-10-13 23:13:13 UTC
me too on 2.6.13-1.1601_FC5/x86_64

Looks to me like a re-occurance of the old bug 132726 which was addressed by the
patch:
  linux-2.6.8-lockd-racewarn2.patch

and this was dropped in kernel-2_6_12-1_1396


Comment 8 Tethys 2005-10-20 20:12:45 UTC
Me too on FC4, this time on IA32, so it's not just a 64 bit issue.

kernel-smp-2.6.13-1.1526_FC4
nfs-utils-1.0.7-12.FC4

Oct 20 21:03:41 leto kernel: Badness in interruptible_sleep_on_timeout at
kernel/sched.c:3297 (Not tainted)
Oct 20 21:03:41 leto kernel:  [<c031732f>] interruptible_sleep_on_timeout+0xf7/0x113
Oct 20 21:03:41 leto kernel:  [<c012b9e1>] group_send_sig_info+0x59/0x63
Oct 20 21:03:41 leto kernel:  [<c011d046>] default_wake_function+0x0/0xc
Oct 20 21:03:41 leto kernel:  [<f8b8448b>] lockd_down+0xbe/0x120 [lockd]
Oct 20 21:03:41 leto kernel:  [<f8bffcbe>] nfs_kill_super+0x5e/0x62 [nfs]
Oct 20 21:03:41 leto kernel:  [<c0169fd8>] deactivate_super+0x5d/0x6e
Oct 20 21:03:41 leto kernel:  [<c017ef44>] sys_umount+0x33/0x73
Oct 20 21:03:41 leto kernel:  [<c017a1a0>] dput+0x126/0x258
Oct 20 21:03:41 leto kernel:  [<c0165066>] __fput+0x139/0x18d
Oct 20 21:03:41 leto kernel:  [<c01638a6>] filp_close+0x3e/0x62
Oct 20 21:03:41 leto kernel:  [<c010395d>] syscall_call+0x7/0xb


Comment 9 Dominik Mierzejewski 2005-11-08 21:48:10 UTC
I can confirm it's still there in kernel-smp-2.6.13-1.1532_FC4, tested on dual
Athlon MP.

Comment 10 Steve Dickson 2005-11-17 21:46:31 UTC
*** Bug 173144 has been marked as a duplicate of this bug. ***

Comment 11 Riku Meskanen 2005-11-19 16:18:46 UTC
Howdy,

Completely repeatable here too with up to date patched FC4 on DELL PE-2550 (ia32) dual processor
configuration. Doesn't however occur with previous 2.6.13-1.1532_FC4 smp-kernel or with similar
single processor PE-2550 configuration running 2.6.14-1.1637_FC4.

HTH,

:-) riku
 
# uname -a
Linux rudy.cc.jyu.fi 2.6.14-1.1637_FC4smp #1 SMP Wed Nov 9 18:34:11 EST 2005 i686 i686 i386 
GNU/Linux

[root@rudy src]# rpm -q nfs-utils
nfs-utils-1.0.7-12.FC4

Badness in interruptible_sleep_on_timeout at kernel/sched.c:3403 (Not tainted)
 [<c031d2b2>] interruptible_sleep_on_timeout+0xf7/0x114
 [<c012b213>] group_send_sig_info+0x59/0x63
 [<c011c4de>] default_wake_function+0x0/0xc
 [<f8c1b4fb>] lockd_down+0xbe/0x120 [lockd]
 [<f8ccdcf7>] nfs_kill_super+0x5c/0x5e [nfs]
 [<c01699ed>] deactivate_super+0x60/0x71
 [<c017eb4e>] sys_umount+0x33/0x73
 [<c0107dc6>] do_syscall_trace+0x1e4/0x1f6
 [<c017eba5>] sys_oldumount+0x17/0x1b
 [<c01039e1>] syscall_call+0x7/0xb
-- 


Comment 12 Dave Jones 2005-11-20 20:32:44 UTC
*** Bug 173730 has been marked as a duplicate of this bug. ***

Comment 13 Steve Dickson 2005-11-22 15:29:55 UTC
Here is that patch that will take care of this
badness warning...

--- fs/lockd/svc.c.orig 2005-10-27 20:02:08.000000000 -0400
+++ fs/lockd/svc.c      2005-11-17 16:31:48.111289000 -0500
@@ -305,7 +305,7 @@ lockd_down(void)
         * the lockd semaphore, we can't wait around forever ...
         */
        clear_thread_flag(TIF_SIGPENDING);
-       interruptible_sleep_on_timeout(&lockd_exit, HZ);
+       wait_event_timeout(lockd_exit, nlmsvc_pid == 0, HZ);
        if (nlmsvc_pid) {
                printk(KERN_WARNING
                        "lockd_down: lockd failed to exit, clearing pid\n");


Comment 14 Dave Jones 2005-11-23 06:07:10 UTC
merged in cvs. built for rawhide, should go out in the first post-test1 push 

(also available from http://people.redhat.com/davej/kernels/Fedora/devel/

Comment 15 Dominik Mierzejewski 2005-12-16 02:06:12 UTC
Will there be an errata for fc4 fixing this?

Comment 16 Dave Jones 2005-12-16 03:57:33 UTC
is already fixed in the current fc4 errata released earlier this week.



Note You need to log in before you can comment on or make changes to this bug.