Bug 124119

Summary: Badness in interruptible_sleep_on at kernel/sched.c:1927
Product: [Fedora] Fedora Reporter: Need Real Name <mb/redhat>
Component: kernelAssignee: Jeff Moyer <jmoyer>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 2   
Target Milestone: ---   
Target Release: ---   
Hardware: athlon   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-21 19:03:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Need Real Name 2004-05-24 09:26:10 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7b)
Gecko/20040321 Firefox/0.8.0+

Description of problem:
After upgrading FC1 -> FC2 except autofs things seemed OK. Had
3c59x-related crashes but switched hardware to get around that. [Have
now switched off kudzu out of paranoia.]

Now dropped in the autofs-4.1.2-2 RPM and the kernel complains. It
livelocked after about 12 hours' uptime.


Version-Release number of selected component (if applicable):
kernel-2.6.5-1.358

How reproducible:
Didn't try

Steps to Reproduce:
1. Boot
2. Wait
3. Boom

[ Strictly speaking, I'm trying my hardest to keep the thing running.
It's supposed to be the most stable machine on our network :-( ]

Actual Results:  The machine dropped off the network. Magic SysRq was
registered on the console (normal keys wouldn;t unblank it), but Sync
or Unmount won't complete.


Expected Results:  I just want a stable fileserver! It was stable
under FC1 even with my hand-built 2.6 kernel.

Additional info:

This happens a few times:

 Badness in interruptible_sleep_on at kernel/sched.c:1927
 Call Trace:
  [<0229fe72>] interruptible_sleep_on+0x5a/0xc6
  [<0211b419>] default_wake_function+0x0/0xc
  [<62aea4f9>] autofs4_wait+0x1fe/0x25e [autofs4]
  [<62ae95e5>] try_to_fill_dentry+0xa8/0x100 [autofs4]
  [<02159950>] do_lookup+0x54/0x72
  [<02159f70>] link_path_walk+0x602/0x7d0
  [<0215a42b>] path_lookup+0x13f/0x16f
  [<0215a567>] __user_walk+0x21/0x51
  [<02156177>] vfs_stat+0x14/0x3a
  [<021566e9>] sys_stat64+0xf/0x23
  [<02196d51>] selinux_task_post_setuid+0x11/0x14
  [<0212952b>] sys_setresuid+0x17a/0x187

..and this once (after a few hours, and a few hours before it crashes):

 irq 9: nobody cared! (screaming interrupt?)
 Call Trace:
  [<021074ed>] __report_bad_irq+0x2b/0x67
  [<02107585>] note_interrupt+0x43/0x66
  [<021077d7>] do_IRQ+0x134/0x19a
  [<02121cf7>] __do_softirq+0x3f/0x9d
  [<02107f91>] do_softirq+0x4f/0x56
  =======================
  [<02107831>] do_IRQ+0x18e/0x19a
  [<02104018>] default_idle+0x0/0x2c
  [<02104041>] default_idle+0x29/0x2c
  [<0210409d>] cpu_idle+0x26/0x3b
  [<02356780>] start_kernel+0x193/0x195

IRQ9 is "acpi":

           CPU0       CPU1
  0:    3931446    3930376    IO-APIC-edge  timer
  1:        833        711    IO-APIC-edge  i8042
  2:          0          0          XT-PIC  cascade
  8:          0          1    IO-APIC-edge  rtc
  9:       2516       2727   IO-APIC-level  acpi
 12:         52          5    IO-APIC-edge  i8042
 14:        333        138    IO-APIC-edge  ide0
169:     469679     478948   IO-APIC-level  gdth
177:     243133       2263   IO-APIC-level  eth0
225:          9       4743   IO-APIC-level  eth2
NMI:          0          0
LOC:    7861925    7861924
ERR:          0
MIS:          0

Comment 1 Need Real Name 2004-05-26 08:20:53 UTC
I appear to be the victim of more than one bug.

Having booted with selinux=0 and acpi=off, and downgraded to autofs3
again I managed an uptime of.. 36 hours. (No complaints from the
kernel till it livelocked in the middle of the night.)

This machine was super-stable under FC1. I may have to downgrade, but
am prepared to endure about another week of early mornings if anyone
can offer me things to try.

Comment 2 Jeff Moyer 2004-06-01 13:06:07 UTC

*** This bug has been marked as a duplicate of 118413 ***

Comment 3 Red Hat Bugzilla 2006-02-21 19:03:39 UTC
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.