Bug 457648

Summary:	irq 17: nobody cared Disabling IRQ #17
Product:	[Fedora] Fedora	Reporter:	Stefano Tognon <ice00>
Component:	kernel	Assignee:	Chris Snook <csnook>
Status:	CLOSED WONTFIX	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	low
Version:	8	CC:	csnook
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2009-01-09 07:51:48 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Stefano Tognon 2008-08-02 10:36:09 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); it; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1

Description of problem:
From when I pass from kernel 2.4.24 to 2.4.25 (and I test all the sub-versions until the latest), I will get (apparently at random) a kernel message to all the consoles that say: 'Disabling IRQ #17'
The errors can occurs after 30 minutes, up to 12 hours, or not appears in one day.
Most of the time the error comes when I have lot of network traffic (like playing with network game, or updating the system with yum).
The problem is that after the network did not work anymore, even if I try to restart the network card.


Version-Release number of selected component (if applicable):
2.6.25.11-60.fc8

How reproducible:
Sometimes


Steps to Reproduce:
1. During normal work but with network activity being done
2.
3.

Actual Results:


Expected Results:


Additional info:
Aug  2 11:11:21 localhost kernel: irq 17: nobody cared (try booting with the "irqpoll" option)
Aug  2 11:11:21 localhost kernel: Pid: 3952, comm: setiathome-5.28 Tainted: P         2.6.25.11-60.fc8 #1
Aug  2 11:11:21 localhost kernel:
Aug  2 11:11:21 localhost kernel: Call Trace:
Aug  2 11:11:21 localhost kernel:  <IRQ>  [<ffffffff81070b23>] __report_bad_irq+0x38/0x7c
Aug  2 11:11:21 localhost kernel:  [<ffffffff81070d62>] note_interrupt+0x1fb/0x241
Aug  2 11:11:21 localhost kernel:  [<ffffffff8107165c>] handle_fasteoi_irq+0xab/0xd0
Aug  2 11:11:21 localhost kernel:  [<ffffffff8100e9ad>] do_IRQ+0xf6/0x169
Aug  2 11:11:21 localhost kernel:  [<ffffffff8100c371>] ret_from_intr+0x0/0xa
Aug  2 11:11:21 localhost kernel:  <EOI>
Aug  2 11:11:21 localhost kernel: handlers:
Aug  2 11:11:21 localhost kernel: [<ffffffff88089886>] (ata_interrupt+0x0/0x1e5 [libata])
Aug  2 11:11:21 localhost kernel: [<ffffffff88396de1>] (azx_interrupt+0x0/0xd1 [snd_hda_intel])
Aug  2 11:11:21 localhost kernel: [<ffffffff882d57ff>] (atl1_intr+0x0/0xbd0 [atl1])
Aug  2 11:11:21 localhost kernel: Disabling IRQ #17
Aug  2 11:11:52 localhost kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Aug  2 11:11:52 localhost kernel: ata5.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Aug  2 11:11:52 localhost kernel:          cdb 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
Aug  2 11:11:52 localhost kernel:          res 40/00:03:00:08:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
Aug  2 11:11:52 localhost kernel: ata5.00: status: { DRDY }
Aug  2 11:11:52 localhost kernel: ata5: soft resetting link
Aug  2 11:11:53 localhost kernel: ata5.00: configured for UDMA/33
Aug  2 11:11:53 localhost kernel: ata5.01: configured for MWDMA2
Aug  2 11:11:53 localhost kernel: ata5: EH complete

more /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  0:         85          1          0          0   IO-APIC-edge      timer
  1:       2791       1751       5627       2489   IO-APIC-edge      i8042
  4:          1          0          1          0   IO-APIC-edge
  8:          0          0          1          0   IO-APIC-edge      rtc
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 16:     259658     186700     114036      70981   IO-APIC-fasteoi   uhci_hcd:usb3, ahci, firewire_ohci, fglrx[0]@PCI:1:0:0
 17:       9927     309787       9923       9923   IO-APIC-fasteoi   pata_jmicron, HDA Intel, eth0
 18:      22676       4991      23097       5210   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb5, uhci_hcd:usb8
 19:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb7
 21:          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 22:      53373      11202     168871      50731   IO-APIC-fasteoi   ata_piix, ata_piix, HDA Intel
 23:          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb6
NMI:          0          0          0          0   Non-maskable interrupts
LOC:    6703547    6339294    6519046    6342927   Local timer interrupts
RES:     107623      67585     142442     127438   Rescheduling interrupts
CAL:      29088      25030      12689      20775   function call interrupts
TLB:        850       1208       1404       1617   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
SPU:          0          0          0          0   Spurious interrupts
ERR:          0

Comment 1 Stefano Tognon 2008-08-15 13:29:19 UTC

From one week I'm able to have not the problem manifested again, only if I pass to the kernel the option 'noirqdebug' at boot.

Comment 2 Chris Snook 2008-09-10 20:30:20 UTC

See if adding this line to /etc/sysconfig/network-scripts/ifcfg-eth0 makes the problem go away:

ETHTOOL_OPTS="tso off"

If so, it's a known bug, which we work around in newer kernels by disabling tso by default.  If not, it's probably something new.

Comment 3 Stefano Tognon 2008-09-14 18:41:42 UTC

With kernel 2.6.25.14-69.fc8, ETHTOOL_OPTS="tso off" and without the option 'noirqdebug' at boot, I have not the error manifested after 3 days of intense use of the PC. I think that the problem is now resolved. Thanks

Comment 4 Chris Snook 2008-09-15 15:11:36 UTC

The workaround should be unnecessary in 2.6.27-based kernels, as it will be the default behavior.  We're still working with Atheros to figure out the root cause of the TSO bug, but in the meantime we've gotten NAPI working upstream (should be in F9 soon), with 930 Mbps sustained application-level throughput, so there's less need for TSO.

Comment 5 Bug Zapper 2008-11-26 11:03:56 UTC

This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 6 Bug Zapper 2009-01-09 07:51:48 UTC

Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.