Description of problem: Mar 26 13:00:19 log3 kernel: irq 217: nobody cared! (screaming interrupt?) Mar 26 13:00:19 log3 kernel: Call Trace: Mar 26 13:00:19 log3 kernel: [<0210ec74>] __report_bad_irq+0x2b/0x67 Mar 26 13:00:19 log3 kernel: [<0210ed0c>] note_interrupt+0x43/0x66 Mar 26 13:00:19 log3 kernel: [<0210f05f>] do_IRQ+0x19c/0x224 Mar 26 13:00:19 log3 kernel: [<0211d73f>] smp_apic_timer_interrupt+0x124/0x129 Mar 26 13:00:19 log3 kernel: [<02107000>] _stext+0x0/0x65 Mar 26 13:00:19 log3 kernel: [<0210b018>] default_idle+0x0/0x2c Mar 26 13:00:19 log3 kernel: [<02107000>] _stext+0x0/0x65 Mar 26 13:00:19 log3 kernel: [<0210b041>] default_idle+0x29/0x2c Mar 26 13:00:19 log3 kernel: [<0210b09d>] cpu_idle+0x26/0x3b Mar 26 13:00:19 log3 kernel: [<02355799>] start_kernel+0x1cc/0x1d1 Mar 26 13:00:19 log3 kernel: Mar 26 13:00:19 log3 kernel: handlers: Mar 26 13:00:19 log3 kernel: [<0222b30b>] (ide_intr+0x0/0x243) Mar 26 13:00:19 log3 kernel: [<0222b30b>] (ide_intr+0x0/0x243) Mar 26 13:00:19 log3 kernel: Disabling IRQ #217 Mar 26 13:00:19 log3 kernel: hde: lost interrupt Mar 26 13:00:19 log3 kernel: APIC error on CPU3: 60(60) Version-Release number of selected component (if applicable): 2.6.3-2.1.253smp How reproducible: Almost always Additional info: I'll try to rescue dmesg, etc. when the server is reachable again.
I'm seeing a similar error, and it occurs shortly after I remove my USB key drive and exit X (at the end of the day). Mar 24 17:20:32 hagrid kernel: usb 1-1: USB disconnect, address 4 Mar 24 17:20:47 hagrid gconfd (icon-31893): Exiting Mar 24 17:20:47 hagrid gdm(pam_unix)[31812]: session closed for user icon Mar 24 17:20:49 hagrid kernel: irq 10: nobody cared! (screaming interrupt?) Mar 24 17:20:49 hagrid kernel: Call Trace: Mar 24 17:20:49 hagrid kernel: [<0210f1e9>] __report_bad_irq+0x2b/0x67 Mar 24 17:20:49 hagrid kernel: [<0210f281>] note_interrupt+0x43/0x66 Mar 24 17:20:49 hagrid kernel: [<0210f714>] do_IRQ+0x248/0x303 Mar 24 17:20:49 hagrid kernel: [<0212b3bc>] __do_softirq+0x2c/0x73 Mar 24 17:20:49 hagrid kernel: [<021103d0>] do_softirq+0x46/0x4d Mar 24 17:20:49 hagrid kernel: ======================= Mar 24 17:20:49 hagrid kernel: [<0210f7c3>] do_IRQ+0x2f7/0x303 Mar 24 17:20:49 hagrid kernel: Mar 24 17:20:49 hagrid kernel: handlers: Mar 24 17:20:49 hagrid kernel: [<32850017>] (usb_hcd_irq+0x0/0x4b [usbcore]) Mar 24 17:20:49 hagrid kernel: [<32850017>] (usb_hcd_irq+0x0/0x4b [usbcore]) Mar 24 17:20:49 hagrid kernel: Disabling IRQ #10 After this happens, nothing USB works any more, and X hangs (at gdm startup). Only reboot helps.
Konstantin, get your own bug. Kaj's box does it for a whole host of other interrupt sources, it's likely a generic problem. Just look at other bugs he filed. Yours though, that one is likely to be USB related, because it's linked with removel of a USB device.
ok, moved to https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=119231
Created attachment 98889 [details] dmesg output Finally got it the dmesg and stuff off the box.. Sort of reproducible just by rebooting it. Seems like the irq problems happen on a regularly unregular schedule around every 2-3 warm reboots. Let me know if you're interested in any other output. :)
Created attachment 98890 [details] cat /proc/interrupts
Created attachment 98891 [details] lspci output
Becomes unreproducible (tried around 20 boots) after upgrading to arjan's 2.6.4-1.290smp (i686). I'll keep this open for a while if it is okay with you guys.
Happens again with 2.6.4-1.298smp (i686).
Looks the same. Mar 31 09:49:50 log3 kernel: APIC error on CPU3: 60(60) Mar 31 09:49:51 log3 last message repeated 16 times Mar 31 09:49:51 log3 kernel: irq 217: nobody cared! (screaming interrupt?) Mar 31 09:49:51 log3 kernel: Call Trace: Mar 31 09:49:51 log3 kernel: [<0210ac60>] __report_bad_irq+0x2b/0x67 Mar 31 09:49:51 log3 kernel: [<0210acf8>] note_interrupt+0x43/0x66 Mar 31 09:49:51 log3 kernel: [<0210b04b>] do_IRQ+0x19c/0x224 Mar 31 09:49:51 log3 kernel: [<02119743>] smp_apic_timer_interrupt+0x124/0x129 Mar 31 09:49:51 log3 kernel: [<02107018>] default_idle+0x0/0x2c Mar 31 09:49:51 log3 kernel: [<02107041>] default_idle+0x29/0x2c Mar 31 09:49:51 log3 kernel: [<0210709d>] cpu_idle+0x26/0x3b Mar 31 09:49:51 log3 kernel: [<023637b9>] start_kernel+0x1cc/0x1d1 Mar 31 09:49:51 log3 kernel: Mar 31 09:49:51 log3 kernel: handlers: Mar 31 09:49:51 log3 kernel: [<0222a4b3>] (ide_intr+0x0/0x243) Mar 31 09:49:51 log3 kernel: [<0222a4b3>] (ide_intr+0x0/0x243) Mar 31 09:49:51 log3 kernel: Disabling IRQ #217 Mar 31 09:49:51 log3 kernel: APIC error on CPU3: 60(60) Mar 31 09:49:51 log3 last message repeated 10 times
Happens with 2.6.5-1.339smp too. This seems to happen with hot reboots only. If the boxes are power cycled it doesnt occur ever. These are Supermicro 6013P-T systems (2x 2.x GHz Xeon, 3 GB memory, E7501 MoBo, MCH + ICH3-S + P64H2, Si3112A, 4x SATA drives)
This got fixed on its own somewhere before 2.6.6-1.403. Unable to reproduce anymore.