Bug 471162
Summary: | kernel-2.6.27.5-100.fc10.x86_64 causes intel 945GME to loose irq | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Kevin Fenzi <kevin> | ||||||||||
Component: | kernel | Assignee: | Dave Airlie <airlied> | ||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | rawhide | CC: | airlied, bgamari, djaara, edneymatias, fedora, joe, kernel-maint, marinaz, moneta.mace, overholt, quintela, selinux, wcohen, wwoods | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2008-11-22 08:29:12 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Kevin Fenzi
2008-11-12 03:57:27 UTC
Created attachment 323292 [details]
/proc//dri/0/i915_gem_interrupt
Created attachment 323293 [details]
i915_gem_interrupt after the crash.
Created attachment 323294 [details]
Xorg.0.log
New dmesg output: CE: hpet increasing min_delta_ns to 50624 nsec irq 16: nobody cared (try booting with the "irqpoll" option) Pid: 0, comm: swapper Not tainted 2.6.27.5-100.fc10.x86_64 #1 Call Trace: <IRQ> [<ffffffff810832a7>] __report_bad_irq+0x38/0x7c [<ffffffff810834f3>] note_interrupt+0x208/0x26d [<ffffffff81083c20>] handle_fasteoi_irq+0xbb/0xeb [<ffffffff8101309e>] do_IRQ+0xf7/0x169 [<ffffffff81010933>] ret_from_intr+0x0/0x2e <EOI> [<ffffffff8105e5a5>] ? tick_nohz_stop_sched_tick+0x2ec/0x301 [<ffffffff8105e81c>] ? tick_nohz_restart_sched_tick+0x171/0x179 [<ffffffff8100f1f1>] ? cpu_idle+0x2a/0x10b [<ffffffff8131f3dd>] ? rest_init+0x61/0x63 handlers: [<ffffffffa03ec3d3>] (i915_driver_irq_handler+0x0/0x19d [i915]) Disabling IRQ #16 This seems to be all solved with the -101 kernel. Been running here for about 10 hours without seeing this issue. It's back. I've just encountered this several times with kernel-2.6.27.5-109.fc10.x86_64: Nov 15 19:35:40 slayer kernel:irq 16: nobody cared (try booting with the "irqpoll" option) Nov 15 19:35:40 slayer kernel:Pid: 3063, comm: Xorg Not tainted 2.6.27.5-109.fc10.x86_64 #1 Nov 15 19:35:40 slayer kernel: Nov 15 19:35:40 slayer kernel:Call Trace: Nov 15 19:35:40 slayer kernel: <IRQ> [<ffffffff8108320f>] __report_bad_irq+0x38/0x7c Nov 15 19:35:40 slayer kernel: [<ffffffff8108345b>] note_interrupt+0x208/0x26d Nov 15 19:35:40 slayer kernel: [<ffffffff81083b88>] handle_fasteoi_irq+0xbb/0xeb Nov 15 19:35:40 slayer kernel: [<ffffffff8101309e>] do_IRQ+0xf7/0x169 Nov 15 19:35:40 slayer kernel: [<ffffffff81010933>] ret_from_intr+0x0/0x2e Nov 15 19:35:40 slayer kernel: <EOI> Nov 15 19:35:40 slayer kernel:handlers: Nov 15 19:35:40 slayer kernel:[<ffffffff8123b3f6>] (usb_hcd_irq+0x0/0xb3) Nov 15 19:35:40 slayer kernel:[<ffffffffa02c646f>] (i915_driver_irq_handler+0x0/0x199 [i915]) Nov 15 19:35:40 slayer kernel:Disabling IRQ #16 Supermicro C2SEA, G45 X4500HD. (In reply to comment #6) > It's back. I've just encountered this several times with > kernel-2.6.27.5-109.fc10.x86_64: Confirmed here on a Dell Latitude D630 with GM965: - touchpad had issues once as well - seems it can be triggered by watching a video with xine - 2.6.27.5-101.fc10.x86_64 seems to works fine - bug 471756 looks like a dupe of this one - I'd tend say this should be a F10Blocker Feel free to grab me as knurd on #fedora-devel or #fedora-kernel if you need more information to debug the problem irq 16: nobody cared (try booting with the "irqpoll" option) Pid: 0, comm: swapper Not tainted 2.6.27.5-109.fc10.x86_64 #1 Call Trace: <IRQ> [<ffffffff8108320f>] __report_bad_irq+0x38/0x7c [<ffffffff8108345b>] note_interrupt+0x208/0x26d [<ffffffff81083b88>] handle_fasteoi_irq+0xbb/0xeb [<ffffffff8101309e>] do_IRQ+0xf7/0x169 [<ffffffff81010933>] ret_from_intr+0x0/0x2e <EOI> [<ffffffff811bcc5a>] ? acpi_idle_enter_simple+0x175/0x1b4 [<ffffffff811bcc52>] ? acpi_idle_enter_simple+0x16d/0x1b4 [<ffffffff81286063>] ? cpuidle_idle_call+0x95/0xc9 [<ffffffff8100f279>] ? cpu_idle+0xb2/0x10b [<ffffffff8131f33d>] ? rest_init+0x61/0x63 handlers: [<ffffffffa038946f>] (i915_driver_irq_handler+0x0/0x199 [i915]) Disabling IRQ #16 CE: hpet increasing min_delta_ns to 15000 nsec (In reply to comment #7) > (In reply to comment #6) > > It's back. I've just encountered this several times with > > kernel-2.6.27.5-109.fc10.x86_64: > Confirmed here on a Dell Latitude D630 with GM965: > [...] > - 2.6.27.5-101.fc10.x86_64 seems to works fine Scratch that, seems it just a bit harder to trigger: CE: hpet increasing min_delta_ns to 15000 nsec irq 16: nobody cared (try booting with the "irqpoll" option) Pid: 2685, comm: Xorg Not tainted 2.6.27.5-101.fc10.x86_64 #1 Call Trace: <IRQ> [<ffffffff810832a7>] __report_bad_irq+0x38/0x7c [<ffffffff810834f3>] note_interrupt+0x208/0x26d [<ffffffff81083c20>] handle_fasteoi_irq+0xbb/0xeb [<ffffffff81011bcc>] ? call_softirq+0x1c/0x28 [<ffffffff8101309e>] do_IRQ+0xf7/0x169 [<ffffffff81010933>] ret_from_intr+0x0/0x2e <EOI> [<ffffffffa0315d6d>] ? drm_clflush_pages+0x6d/0xa4 [drm] [<ffffffffa034a9c8>] ? i915_gem_clflush_object+0x21/0x23 [i915] [<ffffffffa034ae2b>] ? i915_gem_object_set_domain+0x94/0xef [i915] [<ffffffffa034c0b8>] ? i915_gem_execbuffer+0x4f9/0x9c6 [i915] [<ffffffff81330c92>] ? _cond_resched+0x9/0x38 [<ffffffffa0316d1d>] ? drm_ioctl+0x1d6/0x25e [drm] [<ffffffffa034bbbf>] ? i915_gem_execbuffer+0x0/0x9c6 [i915] [<ffffffff810cb833>] ? vfs_ioctl+0x5f/0x78 [<ffffffff810cba99>] ? do_vfs_ioctl+0x24d/0x26a [<ffffffff810cbb0b>] ? sys_ioctl+0x55/0x7a [<ffffffff8101024a>] ? system_call_fastpath+0x16/0x1b handlers: [<ffffffffa034843b>] (i915_driver_irq_handler+0x0/0x199 [i915]) Disabling IRQ #16 /me goes back to run kernel-2.6.27.4-79.fc10.x86_64, which had worked fine for the past few days before I tried to boot into 2.6.27.5-109.fc10 earlier today I've reported a similar (dup?) bug here: https://bugzilla.redhat.com/show_bug.cgi?id=471756 The setup is similar to above: kernel-2.6.27.5-109.fc10.x86_64, Thinkpad X61 with 965: 00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c) 00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c) 00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c) I believe I can reproduce this by running "padsp audacity". Yeah, I can trigger this by running "padsp audacity" and editing some files.... I tried rebooting with "irqpoll" as suggested in the message spew, and then triggering with "padsp audacity". True to form, this caused really sluggish behavior, but no crash and no messages. The system was pretty loaded, but I did capture the output of /proc/interrupts twice, roughly five seconds apart. Here is the output of diff. If I read this correctly, I got about 20K interrupts on IRQ #16 in that period. That to be expected? Or is some device wedged? Or .... [tbl@tlondon ~]$ diff intr.* 2c2 < 0: 192144 84661 IO-APIC-edge timer --- > 0: 193220 84669 IO-APIC-edge timer 11,12c11,12 < 16: 2527 712203 IO-APIC-fasteoi ahci, uhci_hcd:usb5, yenta, i915@pci:0000:00:02.0 < 17: 187 9314 IO-APIC-fasteoi uhci_hcd:usb6, firewire_ohci, HDA Intel, iwlagn --- > 16: 2527 732742 IO-APIC-fasteoi ahci, uhci_hcd:usb5, yenta, i915@pci:0000:00:02.0 > 17: 187 9359 IO-APIC-fasteoi uhci_hcd:usb6, firewire_ohci, HDA Intel, iwlagn 15c15 < 20: 20854 76 IO-APIC-fasteoi uhci_hcd:usb3, eth0 --- > 20: 21026 76 IO-APIC-fasteoi uhci_hcd:usb3, eth0 17c17 < 22: 54 7432 IO-APIC-fasteoi ehci_hcd:usb1 --- > 22: 54 7438 IO-APIC-fasteoi ehci_hcd:usb1 19,20c19,20 < LOC: 109740 217026 Local timer interrupts < RES: 65349 34322 Rescheduling interrupts --- > LOC: 109804 217900 Local timer interrupts > RES: 65756 34353 Rescheduling interrupts 22c22 < TLB: 146 199 TLB shootdowns --- > TLB: 146 200 TLB shootdowns [tbl@tlondon ~]$ A similar run on a "behaving" system (i.e., capturing "/proc/interrupts" 5 seconds apart) yields:
11,12c11,12
< 16: 2445 86654 IO-APIC-fasteoi ahci, uhci_hcd:usb5, yenta, i915@pci:0000:00:02.0
< 17: 44659 223 IO-APIC-fasteoi uhci_hcd:usb6, firewire_ohci, HDA Intel, iwlagn
---
> 16: 2445 86752 IO-APIC-fasteoi ahci, uhci_hcd:usb5, yenta, i915@pci:0000:00:02.0
> 17: 44741 223 IO-APIC-fasteoi uhci_hcd:usb6, firewire_ohci, HDA Intel, iwlagn
or only 98 interrupts.
Created attachment 323769 [details]
Another "screenshot" w/ "Disabling IRQ #16"
Got this one with kernel-2.6.27.5-113.fc10.x86_64 on my Thinkpad X61 (Intel 965).
I was listening to music (rhythmbox), and using Rhythmbox to update an id3v2 tag on a file in an ntfs-3g file system.
After the messages shown on the attachment, system appears to have "turn off" SATA interface, so I got lots of disk error messages.
*** Bug 471848 has been marked as a duplicate of this bug. *** For the record, this has happened so far with -109, -100, and -113 for me (which are all of the kernels past -94 that I have tested extensively) *** Bug 471756 has been marked as a duplicate of this bug. *** After a few hours of running -116, I have yet to see the issue recur. I'll post back if anything changes but it looks like this might be fixed. FWIW, 2.6.27.5-116.fc10.x86_64 seems to work for me as well. At least until now -- I suppose problem will show up once I submit this comment ;-) Some people on #intel-gfx still seem to be having issues. If this is you, leave a comment on the freedesktop bug (https://bugs.freedesktop.org/show_bug.cgi?id=18609) It seems like i945GM and friends are now fixed, but Cantiga or "Integrated Graphics Controller" devices are now broken. See bug 471937. *** Bug 471937 has been marked as a duplicate of this bug. *** seems to be fixed afaics, hence closing this There are still people in #intel-gfx who are seeing this issue and the upstream bug is still open. This should be reopened. I agree. It reoccurred for me as well. (In reply to comment #22) > There are still people in #intel-gfx who are seeing this issue and the upstream > bug is still open. This should be reopened. (In reply to comment #23) > I agree. It reoccurred for me as well. FWIW, 2.6.27.5-117.fc10.x86_64 worked fine for me all the time; quickly after updating to 2.6.27.7-134.fc10.x86_64 the problem showed up again: irq 16: nobody cared (try booting with the "irqpoll" option) Pid: 0, comm: swapper Not tainted 2.6.27.7-134.fc10.x86_64 #1 Call Trace: <IRQ> [<ffffffff81083207>] __report_bad_irq+0x38/0x7c [<ffffffff81083453>] note_interrupt+0x208/0x26d [<ffffffff81083b80>] handle_fasteoi_irq+0xbb/0xeb [<ffffffff8101309e>] do_IRQ+0xf7/0x169 [<ffffffff81010933>] ret_from_intr+0x0/0x2e <EOI> [<ffffffff811bcac6>] ? acpi_idle_enter_simple+0x175/0x1b4 [<ffffffff811bcabe>] ? acpi_idle_enter_simple+0x16d/0x1b4 [<ffffffff81285f8b>] ? cpuidle_idle_call+0x95/0xc9 [<ffffffff8100f279>] ? cpu_idle+0xb2/0x10b [<ffffffff8132cd35>] ? start_secondary+0x16e/0x173 handlers: [<ffffffffa03ae465>] (i915_driver_irq_handler+0x0/0x207 [i915]) Disabling IRQ #16 But it happened only once over the last few hours; I'll open a new bug in case it happens more often again *** Bug 472161 has been marked as a duplicate of this bug. *** How is an open bug being marked as a duplicate of a closed bug, if the problem is still existing? (In reply to comment #24) > But it happened only once over the last few hours; I'll open a new bug in case > it happens more often again Happened again; filed Bug 474624, as this problem was fixed for me with 2.6.27.5-117.fc10.x86_64 (In reply to comment #26) > How is an open bug being marked as a duplicate of a closed bug, if the problem > is still existing? As commented in that bug already: if you think it's not a dupe then just reopen it Only the bug creator or developer can re-open a bug. I guess everyone that experiences the problem can just jump on Bug 474624 instead. well, I would be happy to re-open this if you guys want... but I am not seeing it at all here anymore. ;( I guess bug 474624 might be better... irq 16: nobody cared (try booting with the "irqpoll" option) Pid: 18370, comm: X Not tainted 2.6.27.12-170.2.5.fc10.x86_64 #1 Call Trace: <IRQ> [<ffffffff810834b3>] __report_bad_irq+0x38/0x7c [<ffffffff810836ff>] note_interrupt+0x208/0x26d [<ffffffff81083e2c>] handle_fasteoi_irq+0xbb/0xeb [<ffffffff81011bcc>] ? call_softirq+0x1c/0x28 [<ffffffff8101309e>] do_IRQ+0xf7/0x169 [<ffffffff81010933>] ret_from_intr+0x0/0x2e <EOI> handlers: [<ffffffff8122a8a6>] (ahci_interrupt+0x0/0x4aa) [<ffffffff8123bec6>] (usb_hcd_irq+0x0/0xb3) [<ffffffffa01aa54f>] (yenta_interrupt+0x0/0xc0 [yenta_socket]) Disabling IRQ #16 I am running 2.6.27.12-170.2.5.fc10.x86_64. Jaroslav, this problem afaics was fixed; you better join us in bug 474624 Thank you Thorsten. |