Bug 839733

Summary: "IRQ 19 might be stuck. Polling" entries in /var/log/messages
Product: [Fedora] Fedora Reporter: Paweł Brodacki <ofbugsandmen>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: 17CC: dwmw2, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-07-12 18:16:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output of dmesg
none
Output of cat /proc/interrupts
none
Output of lspci none

Description Paweł Brodacki 2012-07-12 16:52:21 UTC
Created attachment 597846 [details]
Output of dmesg

Description of problem:
It seems that I have been bitten by ASM1083 bug (described e.g. in this thread: https://lkml.org/lkml/2012/1/30/216). I was unable to find Bugzilla entry for this chip or Asus E45M1-M Pro board, which bears this chip and which I bought.

Within an hour of the boot in /var/log/messages start appearing lines of
IRQ 19 might be stuck.  Polling
After the first occurrence they will re-appear separated by a couple of seconds to couple of hours. The frequency of log entries seems to correlate with amount of traffic on the network, which seems reasonable, as IRQ 19 is servicing a network card.

Version-Release number of selected component (if applicable):


How reproducible:
For the last week the bug occurs at least every couple of hours.


Steps to Reproduce:
1. Install a NIC into PCI slot of Asus E45M1-M Pro board.
2. Have traffic through the NIC.
3.
  
Actual results:
IRQ 19 might be stuck.  Polling
entries in /var/log/messages

Expected results:
No stuck interrupts reported.

Additional info:

The LKML thread points at the problem within the ASM1083 chip itself, so I do not expect miracles, but I'm going to wait for one anyhow. ;)

I would also like to request confirmation, that ditching the PCI NIC and replacing it with one using PCI Express bus should eliminate the problem.

I'm creating this Bugzilla entry also to help people decide when choosing hardware to buy. Voting with money works, and Asus E45M1-M Pro currently uses a faulty chip. My recommendation is to avoid this board and any other that uses the problematic ASM1083 chip.

Comment 1 Paweł Brodacki 2012-07-12 16:53:20 UTC
Created attachment 597851 [details]
Output of cat /proc/interrupts

Comment 2 Paweł Brodacki 2012-07-12 16:53:47 UTC
Created attachment 597853 [details]
Output of lspci

Comment 3 Josh Boyer 2012-07-12 18:16:54 UTC
(In reply to comment #0)
> Actual results:
> IRQ 19 might be stuck.  Polling
> entries in /var/log/messages
> 
> Expected results:
> No stuck interrupts reported.
> 
> Additional info:
> 
> The LKML thread points at the problem within the ASM1083 chip itself, so I
> do not expect miracles, but I'm going to wait for one anyhow. ;)

You already have the closest thing we've come to a fix for the issue.  We carry a patch called unhandled-irqs-switch-to-polling.patch which does the automatic switching of stuck interrupts to polling mode for just a bit and then goes back to regular operation.  That is why you see polling messages.  Without that patch, the kernel would mark IRQ 19 as stuck entirely and render everything that has that interrupt assigned to it useless.

Not a miracle, but at least your box remains reasonably functional.

> I would also like to request confirmation, that ditching the PCI NIC and
> replacing it with one using PCI Express bus should eliminate the problem.

Quite possibly, yes.  I don't see anything else in your dmesg that would be behind the ASM bridge.

> I'm creating this Bugzilla entry also to help people decide when choosing
> hardware to buy. Voting with money works, and Asus E45M1-M Pro currently
> uses a faulty chip. My recommendation is to avoid this board and any other
> that uses the problematic ASM1083 chip.

We already have a bug that covered this and leaving this one open isn't really going to change anything.  We'll duplicate this bug to the original.  We appreciate the report though.

*** This bug has been marked as a duplicate of bug 755956 ***

Comment 4 Paweł Brodacki 2012-07-12 18:40:35 UTC
Thanks for the explanation.