Bug 471162

Summary: kernel-2.6.27.5-100.fc10.x86_64 causes intel 945GME to loose irq
Product: [Fedora] Fedora Reporter: Kevin Fenzi <kevin>
Component: kernelAssignee: Dave Airlie <airlied>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: airlied, bgamari, djaara, edneymatias, fedora, joe, kernel-maint, marinaz, moneta.mace, overholt, quintela, selinux, wcohen, wwoods
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-22 08:29:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/proc//dri/0/i915_gem_interrupt
none
i915_gem_interrupt after the crash.
none
Xorg.0.log
none
Another "screenshot" w/ "Disabling IRQ #16" none

Description Kevin Fenzi 2008-11-12 03:57:27 UTC
A few minutes of use after booting this kernel (2.6.27.5-100.fc10.x86_64), I get the following in dmesg: 

irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.27.5-100.fc10.x86_64 #1

Call Trace:
 <IRQ>  [<ffffffff810832a7>] __report_bad_irq+0x38/0x7c
 [<ffffffff810834f3>] note_interrupt+0x208/0x26d
 [<ffffffff81083c20>] handle_fasteoi_irq+0xbb/0xeb
 [<ffffffff8101309e>] do_IRQ+0xf7/0x169
 [<ffffffff81010933>] ret_from_intr+0x0/0x2e
 <EOI>  [<ffffffff81332650>] ? _spin_unlock_irqrestore+0x33/0x3e
 [<ffffffff8105d675>] ? tick_broadcast_oneshot_control+0xf4/0xfd
 [<ffffffff8105cf53>] ? tick_notify+0x22a/0x37b
 [<ffffffff813354e6>] ? notifier_call_chain+0x33/0x5b
 [<ffffffff81058bb0>] ? raw_notifier_call_chain+0xf/0x11
 [<ffffffff8105c95d>] ? clockevents_notify+0x2b/0x63
 [<ffffffff811bc4e2>] ? acpi_state_timer_broadcast+0x41/0x43
 [<ffffffff811bcd1c>] ? acpi_idle_enter_simple+0x197/0x1b4
 [<ffffffff81286103>] ? cpuidle_idle_call+0x95/0xc9
 [<ffffffff8100f279>] ? cpu_idle+0xb2/0x10b
 [<ffffffff8131f3dd>] ? rest_init+0x61/0x63

handlers:
[<ffffffffa03d73d3>] (i915_driver_irq_handler+0x0/0x19d [i915])
Disabling IRQ #16

After that X is pretty unusable. I must move the mouse around or type on the keyboard to get it to refresh at all. 

The only thing in Xorg.0.log.old that looks out of the ordinary is: 
exaCopyDirty: Pending damage region empty!

Happy to try again or provide more info.

Comment 1 Kevin Fenzi 2008-11-12 04:26:59 UTC
Created attachment 323292 [details]
/proc//dri/0/i915_gem_interrupt

Comment 2 Kevin Fenzi 2008-11-12 04:30:02 UTC
Created attachment 323293 [details]
i915_gem_interrupt after the crash.

Comment 3 Kevin Fenzi 2008-11-12 04:32:28 UTC
Created attachment 323294 [details]
Xorg.0.log

Comment 4 Kevin Fenzi 2008-11-12 04:35:14 UTC
New dmesg output: 

CE: hpet increasing min_delta_ns to 50624 nsec
irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.27.5-100.fc10.x86_64 #1

Call Trace:
 <IRQ>  [<ffffffff810832a7>] __report_bad_irq+0x38/0x7c
 [<ffffffff810834f3>] note_interrupt+0x208/0x26d
 [<ffffffff81083c20>] handle_fasteoi_irq+0xbb/0xeb
 [<ffffffff8101309e>] do_IRQ+0xf7/0x169
 [<ffffffff81010933>] ret_from_intr+0x0/0x2e
 <EOI>  [<ffffffff8105e5a5>] ? tick_nohz_stop_sched_tick+0x2ec/0x301
 [<ffffffff8105e81c>] ? tick_nohz_restart_sched_tick+0x171/0x179
 [<ffffffff8100f1f1>] ? cpu_idle+0x2a/0x10b
 [<ffffffff8131f3dd>] ? rest_init+0x61/0x63

handlers:
[<ffffffffa03ec3d3>] (i915_driver_irq_handler+0x0/0x19d [i915])
Disabling IRQ #16

Comment 5 Kevin Fenzi 2008-11-12 17:23:36 UTC
This seems to be all solved with the -101 kernel. 

Been running here for about 10 hours without seeing this issue.

Comment 6 Mace Moneta 2008-11-16 00:43:14 UTC
It's back. I've just encountered this several times with
kernel-2.6.27.5-109.fc10.x86_64:

Nov 15 19:35:40 slayer kernel:irq 16: nobody cared (try booting with the "irqpoll" option)
Nov 15 19:35:40 slayer kernel:Pid: 3063, comm: Xorg Not tainted 2.6.27.5-109.fc10.x86_64 #1
Nov 15 19:35:40 slayer kernel:
Nov 15 19:35:40 slayer kernel:Call Trace:
Nov 15 19:35:40 slayer kernel: <IRQ>  [<ffffffff8108320f>] __report_bad_irq+0x38/0x7c
Nov 15 19:35:40 slayer kernel: [<ffffffff8108345b>] note_interrupt+0x208/0x26d
Nov 15 19:35:40 slayer kernel: [<ffffffff81083b88>] handle_fasteoi_irq+0xbb/0xeb
Nov 15 19:35:40 slayer kernel: [<ffffffff8101309e>] do_IRQ+0xf7/0x169
Nov 15 19:35:40 slayer kernel: [<ffffffff81010933>] ret_from_intr+0x0/0x2e
Nov 15 19:35:40 slayer kernel: <EOI> 
Nov 15 19:35:40 slayer kernel:handlers:
Nov 15 19:35:40 slayer kernel:[<ffffffff8123b3f6>] (usb_hcd_irq+0x0/0xb3)
Nov 15 19:35:40 slayer kernel:[<ffffffffa02c646f>] (i915_driver_irq_handler+0x0/0x199 [i915])
Nov 15 19:35:40 slayer kernel:Disabling IRQ #16

Supermicro C2SEA, G45 X4500HD.

Comment 7 Thorsten Leemhuis 2008-11-16 07:32:11 UTC
(In reply to comment #6)
> It's back. I've just encountered this several times with
> kernel-2.6.27.5-109.fc10.x86_64:

Confirmed here on a Dell Latitude D630 with GM965:

- touchpad had issues once as well
- seems it can be triggered by watching a video with xine
- 2.6.27.5-101.fc10.x86_64 seems to works fine
- bug 471756 looks like a dupe of this one
- I'd tend say this should be a F10Blocker

Feel free to grab me as knurd on #fedora-devel or #fedora-kernel if you need more information to debug the problem

irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.27.5-109.fc10.x86_64 #1

Call Trace:
 <IRQ>  [<ffffffff8108320f>] __report_bad_irq+0x38/0x7c
 [<ffffffff8108345b>] note_interrupt+0x208/0x26d
 [<ffffffff81083b88>] handle_fasteoi_irq+0xbb/0xeb
 [<ffffffff8101309e>] do_IRQ+0xf7/0x169
 [<ffffffff81010933>] ret_from_intr+0x0/0x2e
 <EOI>  [<ffffffff811bcc5a>] ? acpi_idle_enter_simple+0x175/0x1b4
 [<ffffffff811bcc52>] ? acpi_idle_enter_simple+0x16d/0x1b4
 [<ffffffff81286063>] ? cpuidle_idle_call+0x95/0xc9
 [<ffffffff8100f279>] ? cpu_idle+0xb2/0x10b
 [<ffffffff8131f33d>] ? rest_init+0x61/0x63

handlers:
[<ffffffffa038946f>] (i915_driver_irq_handler+0x0/0x199 [i915])
Disabling IRQ #16
CE: hpet increasing min_delta_ns to 15000 nsec

Comment 8 Thorsten Leemhuis 2008-11-16 07:56:30 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > It's back. I've just encountered this several times with
> > kernel-2.6.27.5-109.fc10.x86_64:
> Confirmed here on a Dell Latitude D630 with GM965:
> [...]
> - 2.6.27.5-101.fc10.x86_64 seems to works fine

Scratch that, seems it just a bit harder to trigger:

CE: hpet increasing min_delta_ns to 15000 nsec
irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 2685, comm: Xorg Not tainted 2.6.27.5-101.fc10.x86_64 #1

Call Trace:
 <IRQ>  [<ffffffff810832a7>] __report_bad_irq+0x38/0x7c
 [<ffffffff810834f3>] note_interrupt+0x208/0x26d
 [<ffffffff81083c20>] handle_fasteoi_irq+0xbb/0xeb
 [<ffffffff81011bcc>] ? call_softirq+0x1c/0x28
 [<ffffffff8101309e>] do_IRQ+0xf7/0x169
 [<ffffffff81010933>] ret_from_intr+0x0/0x2e
 <EOI>  [<ffffffffa0315d6d>] ? drm_clflush_pages+0x6d/0xa4 [drm]
 [<ffffffffa034a9c8>] ? i915_gem_clflush_object+0x21/0x23 [i915]
 [<ffffffffa034ae2b>] ? i915_gem_object_set_domain+0x94/0xef [i915]
 [<ffffffffa034c0b8>] ? i915_gem_execbuffer+0x4f9/0x9c6 [i915]
 [<ffffffff81330c92>] ? _cond_resched+0x9/0x38
 [<ffffffffa0316d1d>] ? drm_ioctl+0x1d6/0x25e [drm]
 [<ffffffffa034bbbf>] ? i915_gem_execbuffer+0x0/0x9c6 [i915]
 [<ffffffff810cb833>] ? vfs_ioctl+0x5f/0x78
 [<ffffffff810cba99>] ? do_vfs_ioctl+0x24d/0x26a
 [<ffffffff810cbb0b>] ? sys_ioctl+0x55/0x7a
 [<ffffffff8101024a>] ? system_call_fastpath+0x16/0x1b

handlers:
[<ffffffffa034843b>] (i915_driver_irq_handler+0x0/0x199 [i915])
Disabling IRQ #16

/me goes back to run kernel-2.6.27.4-79.fc10.x86_64, which had worked fine for the past few days before I tried to boot into 2.6.27.5-109.fc10 earlier today

Comment 9 Tom London 2008-11-16 17:31:34 UTC
I've reported a similar (dup?) bug here: https://bugzilla.redhat.com/show_bug.cgi?id=471756

The setup is similar to above: kernel-2.6.27.5-109.fc10.x86_64, Thinkpad X61 with 965:
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c)
00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)
00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)


I believe I can reproduce this by running "padsp audacity".

Comment 10 Tom London 2008-11-16 21:08:50 UTC
Yeah, I can trigger this by running "padsp audacity" and editing some files....

I tried rebooting with "irqpoll" as suggested in the message spew, and then triggering with "padsp audacity".

True to form, this caused really sluggish behavior, but no crash and no messages. The system was pretty loaded, but I did capture the output of /proc/interrupts twice, roughly five seconds apart.

Here is the output of diff.  If I read this correctly, I got about 20K interrupts on IRQ #16 in that period.

That to be expected? Or is some device wedged?  Or ....

[tbl@tlondon ~]$ diff intr.*
2c2
<   0:     192144      84661   IO-APIC-edge      timer
---
>   0:     193220      84669   IO-APIC-edge      timer
11,12c11,12
<  16:       2527     712203   IO-APIC-fasteoi   ahci, uhci_hcd:usb5, yenta, i915@pci:0000:00:02.0
<  17:        187       9314   IO-APIC-fasteoi   uhci_hcd:usb6, firewire_ohci, HDA Intel, iwlagn
---
>  16:       2527     732742   IO-APIC-fasteoi   ahci, uhci_hcd:usb5, yenta, i915@pci:0000:00:02.0
>  17:        187       9359   IO-APIC-fasteoi   uhci_hcd:usb6, firewire_ohci, HDA Intel, iwlagn
15c15
<  20:      20854         76   IO-APIC-fasteoi   uhci_hcd:usb3, eth0
---
>  20:      21026         76   IO-APIC-fasteoi   uhci_hcd:usb3, eth0
17c17
<  22:         54       7432   IO-APIC-fasteoi   ehci_hcd:usb1
---
>  22:         54       7438   IO-APIC-fasteoi   ehci_hcd:usb1
19,20c19,20
< LOC:     109740     217026   Local timer interrupts
< RES:      65349      34322   Rescheduling interrupts
---
> LOC:     109804     217900   Local timer interrupts
> RES:      65756      34353   Rescheduling interrupts
22c22
< TLB:        146        199   TLB shootdowns
---
> TLB:        146        200   TLB shootdowns
[tbl@tlondon ~]$

Comment 11 Tom London 2008-11-16 21:19:07 UTC
A similar run on a "behaving" system (i.e., capturing "/proc/interrupts" 5 seconds apart) yields:

11,12c11,12
<  16:       2445      86654   IO-APIC-fasteoi   ahci, uhci_hcd:usb5, yenta, i915@pci:0000:00:02.0
<  17:      44659        223   IO-APIC-fasteoi   uhci_hcd:usb6, firewire_ohci, HDA Intel, iwlagn
---
>  16:       2445      86752   IO-APIC-fasteoi   ahci, uhci_hcd:usb5, yenta, i915@pci:0000:00:02.0
>  17:      44741        223   IO-APIC-fasteoi   uhci_hcd:usb6, firewire_ohci, HDA Intel, iwlagn

or only 98 interrupts.

Comment 12 Tom London 2008-11-17 15:18:10 UTC
Created attachment 323769 [details]
Another "screenshot" w/ "Disabling IRQ #16"

Got this one with kernel-2.6.27.5-113.fc10.x86_64 on my Thinkpad X61 (Intel 965).

I was listening to music (rhythmbox), and using Rhythmbox to update an id3v2 tag on a file in an ntfs-3g file system.

After the messages shown on the attachment, system appears to have "turn off" SATA interface, so I got lots of disk error messages.

Comment 13 Paul W. Frields 2008-11-17 15:36:08 UTC
*** Bug 471848 has been marked as a duplicate of this bug. ***

Comment 14 Ben Gamari 2008-11-18 07:03:55 UTC
For the record, this has happened so far with -109, -100, and -113 for me (which are all of the kernels past -94 that I have tested extensively)

Comment 15 Tom London 2008-11-18 15:38:06 UTC
*** Bug 471756 has been marked as a duplicate of this bug. ***

Comment 16 Ben Gamari 2008-11-18 18:27:15 UTC
After a few hours of running -116, I have yet to see the issue recur. I'll post back if anything changes but it looks like this might be fixed.

Comment 17 Thorsten Leemhuis 2008-11-18 18:38:15 UTC
FWIW, 2.6.27.5-116.fc10.x86_64 seems to work for me as well. At least until now -- I suppose problem will show up once I submit this comment ;-)

Comment 18 Ben Gamari 2008-11-18 22:47:16 UTC
Some people on #intel-gfx still seem to be having issues. If this is you, leave a comment on the freedesktop bug (https://bugs.freedesktop.org/show_bug.cgi?id=18609)

Comment 19 Will Woods 2008-11-19 01:38:59 UTC
It seems like i945GM and friends are now fixed, but Cantiga or "Integrated Graphics Controller" devices are now broken. 

See bug 471937.

Comment 20 Thorsten Leemhuis 2008-11-22 08:27:59 UTC
*** Bug 471937 has been marked as a duplicate of this bug. ***

Comment 21 Thorsten Leemhuis 2008-11-22 08:29:12 UTC
seems to be fixed afaics, hence closing this

Comment 22 Ben Gamari 2008-12-02 19:25:00 UTC
There are still people in #intel-gfx who are seeing this issue and the upstream bug is still open. This should be reopened.

Comment 23 Mace Moneta 2008-12-02 19:41:32 UTC
I agree.  It reoccurred for me as well.

Comment 24 Thorsten Leemhuis 2008-12-04 16:23:10 UTC
(In reply to comment #22)
> There are still people in #intel-gfx who are seeing this issue and the upstream
> bug is still open. This should be reopened.

(In reply to comment #23)
> I agree.  It reoccurred for me as well.

FWIW, 2.6.27.5-117.fc10.x86_64 worked fine for me all the time; quickly after updating to 2.6.27.7-134.fc10.x86_64 the problem showed up again:

irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 0, comm: swapper Not tainted 2.6.27.7-134.fc10.x86_64 #1

Call Trace:
 <IRQ>  [<ffffffff81083207>] __report_bad_irq+0x38/0x7c
 [<ffffffff81083453>] note_interrupt+0x208/0x26d
 [<ffffffff81083b80>] handle_fasteoi_irq+0xbb/0xeb
 [<ffffffff8101309e>] do_IRQ+0xf7/0x169
 [<ffffffff81010933>] ret_from_intr+0x0/0x2e
 <EOI>  [<ffffffff811bcac6>] ? acpi_idle_enter_simple+0x175/0x1b4
 [<ffffffff811bcabe>] ? acpi_idle_enter_simple+0x16d/0x1b4
 [<ffffffff81285f8b>] ? cpuidle_idle_call+0x95/0xc9
 [<ffffffff8100f279>] ? cpu_idle+0xb2/0x10b
 [<ffffffff8132cd35>] ? start_secondary+0x16e/0x173

handlers:
[<ffffffffa03ae465>] (i915_driver_irq_handler+0x0/0x207 [i915])
Disabling IRQ #16

But it happened only once over the last few hours; I'll open a new bug in case it happens more often again

Comment 25 Thorsten Leemhuis 2008-12-04 17:04:58 UTC
*** Bug 472161 has been marked as a duplicate of this bug. ***

Comment 26 Mace Moneta 2008-12-04 17:25:33 UTC
How is an open bug being marked as a duplicate of a closed bug, if the problem is still existing?

Comment 27 Thorsten Leemhuis 2008-12-04 17:41:58 UTC
(In reply to comment #24)
> But it happened only once over the last few hours; I'll open a new bug in case
> it happens more often again

Happened again; filed Bug 474624, as this problem was fixed for me with 2.6.27.5-117.fc10.x86_64

(In reply to comment #26)
> How is an open bug being marked as a duplicate of a closed bug, if the problem
> is still existing?

As commented in that bug already: if you think it's not a dupe then just reopen it

Comment 28 Mace Moneta 2008-12-04 17:57:17 UTC
Only the bug creator or developer can re-open a bug.  I guess everyone that experiences the problem can just jump on Bug 474624 instead.

Comment 29 Kevin Fenzi 2008-12-04 18:31:04 UTC
well, I would be happy to re-open this if you guys want... but I am not seeing it at all here anymore. ;( 

I guess bug 474624 might be better...

Comment 30 Jaroslav Barton 2009-01-27 17:23:51 UTC
irq 16: nobody cared (try booting with the "irqpoll" option)
Pid: 18370, comm: X Not tainted 2.6.27.12-170.2.5.fc10.x86_64 #1

Call Trace:
<IRQ>  [<ffffffff810834b3>] __report_bad_irq+0x38/0x7c
[<ffffffff810836ff>] note_interrupt+0x208/0x26d
[<ffffffff81083e2c>] handle_fasteoi_irq+0xbb/0xeb
[<ffffffff81011bcc>] ? call_softirq+0x1c/0x28
[<ffffffff8101309e>] do_IRQ+0xf7/0x169
[<ffffffff81010933>] ret_from_intr+0x0/0x2e
<EOI>
handlers:
[<ffffffff8122a8a6>] (ahci_interrupt+0x0/0x4aa)
[<ffffffff8123bec6>] (usb_hcd_irq+0x0/0xb3)
[<ffffffffa01aa54f>] (yenta_interrupt+0x0/0xc0 [yenta_socket])
Disabling IRQ #16

I am running 2.6.27.12-170.2.5.fc10.x86_64.

Comment 31 Thorsten Leemhuis 2009-01-27 17:54:48 UTC
Jaroslav, this problem afaics was fixed; you better join us in bug 474624

Comment 32 Jaroslav Barton 2009-01-27 18:15:08 UTC
Thank you Thorsten.