Bug 132930

Summary: Disabling IRQ #11 - r128 driver bug
Product: [Fedora] Fedora Reporter: Bill Shannon <bill.shannon>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED DUPLICATE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 2CC: djuran, pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-02-21 19:05:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bill Shannon 2004-09-19 23:15:18 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3)
Gecko/20040913

Description of problem:
I upgraded from FC1 to FC2.  The same hardware runs FC1 with no
problems.  I've run into several problems after the upgrade.
This is just one of them.

The hardware is a Dell Dimension 4550, 1GB ram, 40GB IDE disk,
ATI Rage 128 Pro Ultra video.

While trying to login, the system hangs.

I can reproduce this easily as follows...

I try to login through the GUI login screen to a test account that has
no special dot files.  The login hangs after putting up the "metacity"
icon.  I login from another machine using ssh and kill gnome-session.
That kills the hung login but also immediately displays the "Disabling
IRQ #11" message in my ssh window.  The GUI login screen continues to
function and I can use it to reboot the system.  The network is dead
(it uses IRQ #11).

If I login as root instead of a normal user, the login succeeds,
but then I run into other problems (not described in this bug
report).

Many people have suggest turning off acpi.  I've tried many things,
including booting with this line in grub.conf:

kernel /vmlinuz-2.6.8-1.521 ro root=LABEL=/  hdd=ide-scsi acpi=off
pci=noacpi noapic

It made no difference.

/proc/interrupts says:

           CPU0
  0:     399303          XT-PIC  timer
  1:         40          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  3:          0          XT-PIC  ehci_hcd
  8:          1          XT-PIC  rtc
  9:          0          XT-PIC  acpi, uhci_hcd
 10:          0          XT-PIC  uhci_hcd
 11:      26207          XT-PIC  uhci_hcd, eth0, r128@PCI:1:0:0, Intel
82801DB-ICH4
 12:         84          XT-PIC  i8042
 14:      12068          XT-PIC  ide0
 15:       1182          XT-PIC  ide1
NMI:          0
ERR:          0 

I've tried disabling both USB and sound in the BIOS.  It didn't help.


Version-Release number of selected component (if applicable):
kernel-2.6.8-1.521

How reproducible:
Always

Steps to Reproduce:
1. login to non-root account using GUI login sreen.
2. when login hangs, use ssh from another machine to login.
3. kill gnome-session
    

Actual Results:  When it dies it says:

Sep 16 21:49:06 dell kernel: irq 11: nobody cared! (screaming interrupt?)
Sep 16 21:49:06 dell kernel: irq 11: Please try booting with acpi=off
and report a bug
Sep 16 21:49:06 dell kernel: Stack pointer is garbage, not printing trace
Sep 16 21:49:06 dell kernel: handlers:
Sep 16 21:49:06 dell kernel: [<429b3010>] (e100_intr+0x0/0xe6 [e100])
Sep 16 21:49:06 dell kernel: [<42a3c9a2>]
(snd_intel8x0_interrupt+0x0/0x44f [snd_intel8x0])
Sep 16 21:49:06 dell kernel: Disabling IRQ #11

In the one case where it printed a stack trace I got:

Sep 11 15:55:42 dell kernel: irq 11: nobody cared! (screaming interrupt?)
Sep 11 15:55:42 dell kernel: Call Trace:
Sep 11 15:55:42 dell kernel:  [<021070c9>] __report_bad_irq+0x2b/0x67
Sep 11 15:55:42 dell kernel:  [<02107161>] note_interrupt+0x43/0x66
Sep 11 15:55:42 dell kernel:  [<02107327>] do_IRQ+0x109/0x169
Sep 11 15:55:42 dell kernel:  [<0211af64>] __do_softirq+0x2c/0x73
Sep 11 15:55:42 dell kernel:  [<021078f5>] do_softirq+0x46/0x4d
Sep 11 15:55:42 dell kernel:  =======================
Sep 11 15:55:42 dell kernel:  [<0210737b>] do_IRQ+0x15d/0x169
Sep 11 15:55:42 dell kernel:
Sep 11 15:55:42 dell kernel: handlers:
Sep 11 15:55:42 dell kernel: [<0221522d>] (usb_hcd_irq+0x0/0x4b)
Sep 11 15:55:42 dell kernel: [<429cbc6e>] (e100_intr+0x0/0xe0 [e100])
Sep 11 15:55:42 dell kernel: [<44d88501>]
(snd_intel8x0_interrupt+0x0/0x17e [snd_intel8x0])
Sep 11 15:55:42 dell kernel: Disabling IRQ #11 


Expected Results:  System doesn't die.

Additional info:

Comment 1 Bill Shannon 2004-10-26 07:18:01 UTC
I've done some more debugging and this is what I've determined.
First, I fixed the problem that was preventing me from logging
in successfully.  That problem was unrelated to this.  Now I can
login and I can reproduce this problem every time I logout.

I built a kernel with some additional debugging information and
it appears that the problem occurs just after the r128 driver is
told to cleanup and remove its IRQ handler.  My theory, which I
have not yet proven, is that the device continues to generate
interrupts even after the IRQ handler for the device has been
removed, which of course means there's no one to handle the
interrupt.  The driver is clearly trying to disable interrupts
for the device, but perhaps it's not working.

Not also that the r128 driver in FC1 did not use interrupts.
That may explain why FC1 did not have this problem.


Comment 2 Bill Shannon 2004-10-28 21:12:44 UTC
Ok, I think I've now proven that my ATI Rage 128 device is generating
interrupts even after the r128 driver disables interrupts.

In r128_driver_irq_uninstall, after it writes to the device to
disable interrupts, I set a global variable that says interrupts
are disabled.  I then use vblank_wait to wait for the next
vblank interrupt.  If interrupts were really disabled, I would
expect vblank_wait to return indicating that the timeout expired.
It doesn't, it returns success.

In r128_dma_service, if I get an interrupt while the global
"interrupts are disabled" flag is set, I increment a counter.
After vblank_wait returns in r128_driver_irq_uninstall, I
check the count.  It indicates that an interrupt was received
after interrupts were disabled.

To me, that looks like proof that there's something wrong here.
Maybe the hardware is broken, or maybe it's just not working
the way the driver expects.  Or maybe the driver is not doing
the right thing to really disable interrupts.

Now I need help from someone who actually understands this driver
to figure out how to fix this.


Comment 3 Bill Shannon 2004-12-16 19:03:29 UTC
This bug is the same as the Xorg bug reported at
https://bugs.freedesktop.org/show_bug.cgi?id=1886
and
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=138822

The workaround described there (comment out ``Load "dri"'')
solved the problem for me.

I'm closing this bug as a duplicate of 138822.


*** This bug has been marked as a duplicate of 138822 ***

Comment 4 Red Hat Bugzilla 2006-02-21 19:05:43 UTC
Changed to 'CLOSED' state since 'RESOLVED' has been deprecated.