Bug 127018

Summary: Badness in interruptible_sleep_on_timeout at kernel/sched.c:2533
Product: [Fedora] Fedora Reporter: Nick Barr <nicky>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 2CC: alan, axel.thimm, be, davej, djr, ekanter, marcel, matt, michal, ndbecker2, nphilipp, pfrields, steved
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-01-11 01:22:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
my proposed patch to fix this problem
none
my proposed patch to fix this problem none

Description Nick Barr 2004-06-30 18:17:48 UTC
Description of problem:

Getting the following trace on the console of a freshly installed
Fedora Core 2 box.

Badness in interruptible_sleep_on_timeout at kernel/sched.c:2533
 [<02281cc0>] interruptible_sleep_on_timeout+0x5d/0xb1
 [<02115e4e>] default_wake_function+0x0/0xc
 [<02239cd5>] netdev_run_todo+0x22/0x1ac
 [<06881d8f>] rtl8139_thread+0x38/0x134 [8139too]
 [<06881d57>] rtl8139_thread+0x0/0x134 [8139too]
 [<021041d9>] kernel_thread_helper+0x5/0xb

I already upgraded from the stock kernel from the CD distribution.

Version-Release number of selected component (if applicable):

kernel-utils-2.4-9.1.131
kernel-2.6.6-1.435


How reproducible:

Every minute. Every time I boot.

I have googled/searched bugzilla and found a couple of similar issues
but these were for a different module and different versions and I did
not think they were appropriate. These were for #116864 and #125482.

Do I need to upgrade the kernel further. If so where would I find such
a kernel?

Thanks

Nick

Comment 1 Jerry 2004-09-04 04:23:00 UTC
Updates for kernel available from up2date or try 'yum update'  Don't
know if this will solve it, but if all is running Ok it may not be a
serious problem.

Jerry

Comment 2 Greg Varga 2004-09-22 19:08:31 UTC
This still is happening in the 2.6.8-1.521 kernel.  This is the latest
kernel from updates.

Badness in interruptible_sleep_on_timeout at kernel/sched.c:2545
 [<022f47ef>] interruptible_sleep_on_timeout+0x5d/0x23a
 [<0211a8ee>] default_wake_function+0x0/0xc
 [<022f4c1a>] __cond_resched+0x14/0x3b
 [<022a00ef>] netdev_run_todo+0x29/0x2c1
 [<0a8e3e41>] rtl8139_thread+0x38/0x134 [8139too]
 [<0a8e3e09>] rtl8139_thread+0x0/0x134 [8139too]
 [<021041d9>] kernel_thread_helper+0x5/0xb

Any ideas?

--Greg

Comment 3 Marcel Mol 2004-09-28 13:45:26 UTC
Similar problems here when unmounting nfs systems when shutting donw
the system:

Sep 28 00:18:49 brian kernel: Badness in
interruptible_sleep_on_timeout at kernel/sched.c:3004
Sep 28 00:18:49 brian kernel:  [<022b5558>]
interruptible_sleep_on_timeout+0xc8/0xd0
Sep 28 00:18:49 brian kernel:  [<0211c7a0>] default_wake_function+0x0/0x20
Sep 28 00:18:49 brian kernel:  [<1ad7389d>] lockd_down+0xbd/0x120 [lockd]
Sep 28 00:18:49 brian kernel:  [<1adcd868>] nfs_kill_super+0x88/0x90 [nfs]
Sep 28 00:18:49 brian kernel:  [<02163462>] deactivate_super+0x72/0xa0
Sep 28 00:18:49 brian kernel:  [<0217a7bf>] sys_umount+0x3f/0xa0
Sep 28 00:18:49 brian kernel:  [<0215da7d>] __fput+0xdd/0x160
Sep 28 00:18:49 brian kernel:  [<0215c0a2>] filp_close+0x52/0xa0
Sep 28 00:18:49 brian kernel:  [<0215c13d>] sys_close+0x4d/0x60

I have seem these messages since kernel 2.6.8-1.533. I think
2.6.7-1.517 worked fine. I'm curently running 2.6.8-1.584 and the
messages are still there.


Comment 4 Elliot Lee 2004-10-13 20:09:23 UTC
*** Bug 131294 has been marked as a duplicate of this bug. ***

Comment 5 Elliot Lee 2004-10-13 20:11:54 UTC
*** Bug 132901 has been marked as a duplicate of this bug. ***

Comment 6 Elliot Lee 2004-10-13 20:18:26 UTC
*** Bug 133509 has been marked as a duplicate of this bug. ***

Comment 7 Matthew Kent 2004-10-13 20:26:47 UTC
Running kernel-smp-2.6.8-1.603 x86_64, still getting crashes mentioned
above.

Comment 8 Elliot Lee 2004-10-13 20:35:41 UTC
*** Bug 133710 has been marked as a duplicate of this bug. ***

Comment 9 Elliot Lee 2004-10-13 21:07:22 UTC
*** Bug 134006 has been marked as a duplicate of this bug. ***

Comment 10 Elliot Lee 2004-10-13 21:14:59 UTC
One bug suggested that this was an NFSd locking issue (#134006 IIRC)

Comment 11 Alan Cox 2004-10-13 22:46:34 UTC
No lockd_down just performs a completely invalid and unsafe sleep that
the new debugging code catches. Its a fairly obvious "duh" bug that
needs switching to use the unrolled wait_for_event functionality.


Comment 12 Steve Dickson 2004-10-14 13:02:21 UTC
Created attachment 105197 [details]
my proposed patch to fix this problem

Comment 13 Steve Dickson 2004-10-14 13:03:17 UTC
Created attachment 105198 [details]
my proposed patch to fix this problem

Its my understanding that this is caused by a non-upstream patch that 
was added to RHEL4 which removed the holding of the BLK lock. When 
I send the upstream version my patch that removed these warnings,  
it was strongly advised (in which I have to agree) not to remove the 
holding BLK lock. So I would suggest we remove the "removing of the BLK 
lock" patch.

Comment 14 Alan Cox 2004-10-14 14:59:38 UTC
Looks sane to me


Comment 15 Steve Dickson 2004-10-15 11:43:44 UTC
*** Bug 133082 has been marked as a duplicate of this bug. ***

Comment 16 Nils Philippsen 2004-12-11 06:00:19 UTC
Still seeing this problem on 2.6.9-1.681_FC3.

Comment 17 Babak Pasdar 2004-12-11 17:39:53 UTC
I continue to get the following with FC2 2.6.9-1.6 when trying to load
a second lirc module.  First module works OK.

ledxmit_dev: IR Remote Control driver registered, at major 72
Badness in sleep_on_timeout at kernel/sched.c:3022
 [<02308ff6>] sleep_on_timeout+0x5d/0x23a
 [<0211bba1>] default_wake_function+0x0/0xc
 [<02125ae4>] __request_region+0x56/0x79
 [<42c968c8>] init_port+0x1cb/0x22f [ledxmit_serial]
 [<42c972e3>] init_module+0x33/0x89 [ledxmit_serial]
 [<0213c2af>] sys_init_module+0x207/0x2ef
ledxmit_serial: auto-detected active high receiver
ledxmit_dev: ledxmit_register_plugin:sample_rate: 0


Comment 18 Alan Cox 2004-12-11 17:42:36 UTC
That lirc module appears buggy. The kernel is trapping this because we
ship the FC kernels with some of the low overhead debugging
functionality enabled.

Alan