Bug 755956 (ASM108x)

Summary:

[abrt] kernel: [348158.080600] irq 16: nobody cared (try booting with the "irqpoll" option) ASM108x

Product:

[Fedora] Fedora

Reporter:

Gilberto "Velenux" Ficara <g.ficara>

Component:

kernel

Assignee:

Kernel Maintainer List <kernel-maint>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

anrobin, banane2k, bjorn.norrliden, eli, gansalmon, gcndavidmn, hk-vndr, itamar, jamescape777, jonathan, kernel-maint, letfid, madhu.chinakonda, mihanit, ofbugsandmen, paul.lipps, philippe.noroy, processor2001, swedishelk, torgeir, zombu3

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Unspecified

Whiteboard:

abrt_hash:7e08d890effc89ca1d6b913761a7ab502029dcee

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2012-07-12 15:45:43 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
File: smolt_data	none
dmesg as requested	none
contents of /proc/interrupts	none
lspci -vvv output	none
lspci -vvvnn output	none
commented kernel debug info	none

Description Gilberto "Velenux" Ficara 2011-11-22 14:31:35 UTC

libreport version: 2.0.7
abrt_version:   2.0.6
cmdline:        BOOT_IMAGE=/vmlinuz-3.1.1-1.fc16.x86_64 root=/dev/mapper/VGsystem-LVroot ro LANG=en_US.UTF-8 rd.dm=0 KEYTABLE=us quiet SYSFONT=latarcyrheb-sun16 rhgb rd.lvm.lv=VGsystem/LVswap rd.md.uuid=c5bcc704:aad5bdaf:07ae3b56:fc61f7dc rd.lvm.lv=VGsystem/LVroot rd.luks=0 irqpoll
comment:        At random intervals. Can't get a fix on what is causing this.
kernel:         3.1.1-1.fc16.x86_64
reason:         [348158.080600] irq 16: nobody cared (try booting with the "irqpoll" option)
time:           mar 22 nov 2011 15:20:34 CET

smolt_data:     Text file, 3136 bytes

backtrace:
:[348158.080600] irq 16: nobody cared (try booting with the "irqpoll" option)
:[348158.080603] Pid: 0, comm: swapper Tainted: G        W   3.1.1-1.fc16.x86_64 #1
:[348158.080605] Call Trace:
:[348158.080606]  <IRQ>  [<ffffffff810b2222>] __report_bad_irq+0x38/0xc3
:[348158.080613]  [<ffffffff810b24bc>] note_interrupt+0x176/0x1fa
:[348158.080615]  [<ffffffff810b0a0f>] handle_irq_event_percpu+0x15d/0x1a5
:[348158.080617]  [<ffffffff810b0a92>] handle_irq_event+0x3b/0x59
:[348158.080619]  [<ffffffff81078268>] ? sched_clock_cpu+0x42/0xc6
:[348158.080621]  [<ffffffff810b2c7c>] handle_fasteoi_irq+0x80/0xa4
:[348158.080624]  [<ffffffff81010af9>] handle_irq+0x88/0x8e
:[348158.080626]  [<ffffffff814c03cd>] do_IRQ+0x4d/0xa5
:[348158.080628]  [<ffffffff814b752e>] common_interrupt+0x6e/0x6e
:[348158.080629]  <EOI>  [<ffffffff813a5c92>] ? poll_idle+0x2f/0x65
:[348158.080634]  [<ffffffff813a5c7e>] ? poll_idle+0x1b/0x65
:[348158.080636]  [<ffffffff813a5fae>] cpuidle_idle_call+0xe8/0x182
:[348158.080638]  [<ffffffff8100e2e3>] cpu_idle+0xa4/0xe8
:[348158.080641]  [<ffffffff81494a5e>] rest_init+0x72/0x74
:[348158.080643]  [<ffffffff81b76b7d>] start_kernel+0x3ab/0x3b6
:[348158.080645]  [<ffffffff81b762c4>] x86_64_start_reservations+0xaf/0xb3
:[348158.080647]  [<ffffffff81b76140>] ? early_idt_handlers+0x140/0x140
:[348158.080648]  [<ffffffff81b763ca>] x86_64_start_kernel+0x102/0x111
:[348158.080649] handlers:
:[348158.080652] [<ffffffffa0338df3>] rtl8139_interrupt
:[348158.080653] Disabling IRQ #16

event_log:
:2011-11-22-15:22:57> Smolt profile successfully saved
:2011-11-22-15:30:22> Invio in corso della notifica di oops a http://submit.kerneloops.org/submitoops.php
:2011-11-22-15:31:30  Kernel oops has not been sent due to Couldn't connect to server
:2011-11-22-15:31:30* (exited with 1)

Comment 1 Gilberto "Velenux" Ficara 2011-11-22 14:31:39 UTC

Created attachment 535047 [details]
File: smolt_data

Comment 2 Gilberto "Velenux" Ficara 2011-11-25 16:54:14 UTC

This got even worse with kernel 3.1.2-1.fc16.x86_64, now it's triggered every 10 minutes or so and it disables my Internet-connected LAN card. Also, I think it's related to bug 717211, since they appear one after another in messages.

About irqpoll, seems like that option doesn't work due to another bug (could't find the reference in RH bugzilla, I found about it on launchpad: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/855199 )

Comment 3 Josh Boyer 2011-12-05 18:17:58 UTC

(In reply to comment #2)
> This got even worse with kernel 3.1.2-1.fc16.x86_64, now it's triggered every
> 10 minutes or so and it disables my Internet-connected LAN card. Also, I think
> it's related to bug 717211, since they appear one after another in messages.

Could you attach the dmesg and /proc/interrupts output?

Also, if you boot with pcie_aspm=off, does it help?

> About irqpoll, seems like that option doesn't work due to another bug (could't
> find the reference in RH bugzilla, I found about it on launchpad:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/855199 )

There are two patches to fix irqpoll.  One should already be in the latest f15/f16 kernels, and the other should be included when we rebase to 3.1.5.

Comment 4 Gilberto "Velenux" Ficara 2011-12-06 10:31:19 UTC

Created attachment 541300 [details]
dmesg as requested

Attached dmesg (with some error messages, at about 1 day, 19 hours uptime).

Comment 5 Gilberto "Velenux" Ficara 2011-12-06 10:32:41 UTC

Created attachment 541301 [details]
contents of /proc/interrupts

contents of /proc/interrupts at about 1 day, 19 hours uptime

Comment 6 Josh Boyer 2011-12-06 15:27:39 UTC

This isn't related to 717211, as that is for the atl1c driver.  Your issue is coming from the 8139too driver.  It seems it's interrupt is triggering and the interrupt handler doesn't see an interrupt in the status register, so it bails.

It would be good to know a few things.

1) Does pcie_aspm=off help on the kernel command line?

2) Have there been previous kernels when you did not see this issue, if so what versions?

3) Have you always had to pass the irqpoll command line parameter, or does your dmesg just show that because you tried the suggestion?

If the option from #1 doesn't help, it might be beneficial to get some debug data from the driver.  You can do:

echo -n 'module 8139too +p' > /sys/kernel/debug/dynamic_debug/control

and it will enable all debug messages from the 8319too driver.  This includes a printk for the interrupt status register (which may result in a lot of printks).

Comment 7 Gilberto "Velenux" Ficara 2011-12-06 17:16:31 UTC

1) it doesn't seems so, the error presented again after I booted with pcie_aspm=off

2) I did test some other distros, it *seems* that 2.6.32 kernel found in Scientific Linux 6.1 LiveCD isn't affected, the bug didn't trigger in several hours. 2.6.38 and above (but I didn't test anything between .32 and .38) did show the bug.

3) No, I didn't pass irqpoll before

I enabled the debug messages, I'll post as soon as there are some in the logs.

I did refer to bug 717211 because abrt did connect another bug I have ("WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xf0/0x150()" that has, as I can see, the same effects, killing the card) to that one.

Maybe I should create another bug for 8139too marking it as duplicate of bug 702723 ?

Comment 8 Josh Boyer 2011-12-06 18:07:57 UTC

(In reply to comment #7)
> 1) it doesn't seems so, the error presented again after I booted with
> pcie_aspm=off

Bummer, ok.

> 2) I did test some other distros, it *seems* that 2.6.32 kernel found in
> Scientific Linux 6.1 LiveCD isn't affected, the bug didn't trigger in several
> hours. 2.6.38 and above (but I didn't test anything between .32 and .38) did
> show the bug.

OK.

> 3) No, I didn't pass irqpoll before

OK.

> I enabled the debug messages, I'll post as soon as there are some in the logs.

Thank you.  They should show up as KERN_DEBUG messages in dmesg.

> I did refer to bug 717211 because abrt did connect another bug I have
> ("WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xf0/0x150()" that has,
> as I can see, the same effects, killing the card) to that one.
> 
> Maybe I should create another bug for 8139too marking it as duplicate of bug
> 702723 ?

No, I don't think we need to do that.  The dev_watchdog error seems to be a direct side-effect of the interrupt being disabled so if that gets fixed the other error should go away as well.

Comment 9 Chuck Ebbert 2011-12-07 15:53:52 UTC

It's possible that some other device you don't have a driver for is generating those interrupts. Does booting with the "noirqdebug" option help? (That will just ignore the extra interrupts.)

Also, please attach the output of the command 'lspci -vvv' (run as root to get the full output.)

Comment 10 Gilberto "Velenux" Ficara 2011-12-07 16:30:46 UTC

Created attachment 542030 [details]
lspci -vvv output

Comment 11 Gilberto "Velenux" Ficara 2011-12-07 16:34:36 UTC

Created attachment 542035 [details]
lspci -vvvnn output

(just in case)

Comment 12 Gilberto "Velenux" Ficara 2011-12-07 16:39:42 UTC

I'll try noirqdebug at next boot. 

Still haven't got any debug message from the driver, but I have a cronjob to reload it when the network is down, so maybe it was lost on reload. I changed the script, so it will reconfigure the debugging properly after reloading the module.

Comment 13 Gilberto "Velenux" Ficara 2011-12-07 23:51:43 UTC

Created attachment 542268 [details]
commented kernel debug info

I configured rsyslog to log kern.* to a separate file. 

The message repeated several thousands times:

for second in 129684 129685 129686 129687 ; do grep $second\. /var/log/kernel.messages | wc -l ; done
0
91967
64845
2

Comment 14 Gilberto "Velenux" Ficara 2011-12-13 11:10:19 UTC

It seems that noirqdebug is doing good as a workaround, now at 5 days uptime and the bug didn't show up.

Comment 15 Josh Boyer 2012-02-02 20:37:08 UTC

Your motherboard is using the ASM108x PCI bridge.  There is a problem identified with this particular chip upstream that might be causing this issue:

http://thread.gmane.org/gmane.linux.kernel/1245767

Comment 16 Josh Boyer 2012-03-05 21:57:46 UTC

We're going to consolidate all of these bugs with the impacted hardware into a single bug.  The latest F15 and F16 kernel updates that should hit the mirrors soon have a patch to at least fall back to the irqpoll method when this happens.  Hopefully it results in a bit better experience for you.

Comment 17 Josh Boyer 2012-03-05 21:58:33 UTC

*** Bug 770210 has been marked as a duplicate of this bug. ***

Comment 18 Josh Boyer 2012-03-05 21:59:08 UTC

*** Bug 799106 has been marked as a duplicate of this bug. ***

Comment 19 Josh Boyer 2012-03-05 21:59:16 UTC

*** Bug 770866 has been marked as a duplicate of this bug. ***

Comment 20 Josh Boyer 2012-03-05 21:59:30 UTC

*** Bug 784050 has been marked as a duplicate of this bug. ***

Comment 21 Josh Boyer 2012-03-05 21:59:35 UTC

*** Bug 773438 has been marked as a duplicate of this bug. ***

Comment 22 Josh Boyer 2012-03-05 21:59:42 UTC

*** Bug 761699 has been marked as a duplicate of this bug. ***

Comment 23 Josh Boyer 2012-03-05 21:59:48 UTC

*** Bug 785339 has been marked as a duplicate of this bug. ***

Comment 24 Josh Boyer 2012-03-05 22:00:11 UTC

*** Bug 756540 has been marked as a duplicate of this bug. ***

Comment 25 Josh Boyer 2012-03-05 22:00:17 UTC

*** Bug 784751 has been marked as a duplicate of this bug. ***

Comment 26 Davoid 2012-03-07 08:58:34 UTC

Hello,

I am writing here to ask for information. I don't know if I can comment here or post a new bug but I think I have the same problem. Can you help to be sure ?
I just bought an Asus P8P67 Evo and I got the irq 16: nobody cared too !
I run kernel 3.2.7-1.fc16.x86_64

I have this chip of Asmedia :
extract of lspci -v:
06:00.0 PCI bridge: ASMedia Technology Inc. Device 1080 (rev 01) (prog-if 01 [Subtractive decode])
	Flags: bus master, fast devsel, latency 0
	Bus: primary=06, secondary=07, subordinate=07, sec-latency=32
	I/O behind bridge: 0000d000-0000dfff
	Memory behind bridge: fb100000-fb1fffff
	Capabilities: [c0] Subsystem: ASMedia Technology Inc. Device 1080

extract of /proc/interrupts:
16:      46710          0          0          0   IO-APIC-fasteoi   p6p1, nvidia

There is a point I don't understand: my irq 16 seems to be linked to my graphic card (nivdia) and to my mother board ethernet card realtek (p6p1).
Is it the same problem ? I am not really familiar with looking at this things. 

I found this on internet http://www.gossamer-threads.com/lists/linux/kernel/1466185, so last evening, I desinstalled my wifi PCI card and put irqpoll option but I always have the irq 16 disabled...

Josh, are you speaking about the kernel update 3.2.7-1.fc16.x86_64 ?
It is available this morning. I will try it.

Thanks for your help and tell me if I need to post any other log or trace...

Comment 27 Davoid 2012-03-08 10:11:35 UTC

Kernel 3.2.9-1 since yesterday. 
No more irq 16 disabled but now it's irq 17.

[38142.742404] irq 17: nobody cared (try booting with the "irqpoll" option)
[38142.742407] Pid: 0, comm: swapper/0 Tainted: P           O 3.2.9-1.fc16.x86_64 #1
[38142.742408] Call Trace:
[38142.742409]  <IRQ>  [<ffffffff810e11ad>] __report_bad_irq+0x3d/0xe0
[38142.742415]  [<ffffffff810e146d>] note_interrupt+0x16d/0x220
[38142.742417]  [<ffffffff8101b9c9>] ? sched_clock+0x9/0x10
[38142.742419]  [<ffffffff810dec39>] handle_irq_event_percpu+0xa9/0x220
[38142.742421]  [<ffffffff810dedf4>] handle_irq_event+0x44/0x70
[38142.742422]  [<ffffffff810e1edf>] handle_fasteoi_irq+0x5f/0xf0
[38142.742425]  [<ffffffff81016226>] handle_irq+0x46/0xb0
[38142.742427]  [<ffffffff815ed5da>] do_IRQ+0x5a/0xe0
[38142.742430]  [<ffffffff815e2f2e>] common_interrupt+0x6e/0x6e
[38142.742431]  <EOI>  [<ffffffff81093fa9>] ? enqueue_hrtimer+0x39/0xc0
[38142.742435]  [<ffffffff81310b2d>] ? intel_idle+0xed/0x150
[38142.742437]  [<ffffffff81310b0f>] ? intel_idle+0xcf/0x150
[38142.742440]  [<ffffffff81493671>] cpuidle_idle_call+0xc1/0x280
[38142.742441]  [<ffffffff8101322a>] cpu_idle+0xca/0x120
[38142.742443]  [<ffffffff815bffce>] rest_init+0x72/0x74
[38142.742446]  [<ffffffff81aebbfe>] start_kernel+0x3ba/0x3c5
[38142.742448]  [<ffffffff81aeb347>] x86_64_start_reservations+0x132/0x136
[38142.742450]  [<ffffffff81aeb140>] ? early_idt_handlers+0x140/0x140
[38142.742452]  [<ffffffff81aeb44d>] x86_64_start_kernel+0x102/0x111
[38142.742453] handlers:
[38142.742461] [<ffffffffa002a270>] irq_handler
[38142.742463] [<ffffffffa010c810>] azx_interrupt
[38142.742464] Disabling IRQ #17


          CPU0       CPU1       CPU2       CPU3       
 17:     200003          0          0          0   IO-APIC-fasteoi   firewire_ohci, snd_hda_intel

Comment 28 zombu2 2012-03-11 06:49:25 UTC

happens during normal boot takes about 30 seconds to clear 

sometimes it's irq 16 and sometimes irq 18


Package: kernel
OS Release: Fedora release 16 (Verne)

Comment 29 Eli Wapniarski 2012-03-14 05:41:28 UTC

Now getting disabling irq 16 messages all the time.

Kernel 3.2.9-2.fc16.x86_64 on 2 machines

Comment 30 Kayvan Amirahmadi 2012-03-14 12:44:38 UTC

Maybe this patch has been applied:
http://www.gossamer-threads.com/lists/linux/kernel/1466185?do=post_view_threaded#1466185

And we got messages every few minute due to reenabling IRQ 16 and ... :

[root@server:~] $ dmesg | grep "IRQ 16"
[ 3434.929999] Disabling IRQ 16
[ 3434.939447] Polling IRQ 16
[ 3435.939812] Reenabling IRQ 16
[ 3441.763261] Disabling IRQ 16
[ 3441.773108] Polling IRQ 16
[ 3442.773471] Reenabling IRQ 16
[ 3560.440243] Disabling IRQ 16
[ 3560.449707] Polling IRQ 16
[ 3561.449072] Reenabling IRQ 16
[ 3685.798417] Disabling IRQ 16

This is annoying, I disabled emergency messages from /etc/rsyslog.conf:
#*.emerg                                                 *

Comment 31 Josh Boyer 2012-03-14 13:38:26 UTC

(In reply to comment #30)
> Maybe this patch has been applied:
> http://www.gossamer-threads.com/lists/linux/kernel/1466185?do=post_view_threaded#1466185
> 

Yes.

> This is annoying, I disabled emergency messages from /etc/rsyslog.conf:
> #*.emerg                                                 *

I've made it less verbose in the 3.2.10-1 kernel in updates-testing.

Comment 32 Davoid 2012-03-14 13:44:53 UTC

Now I have Kernel 3.2.9-2.fc16.x86_64 too and I have the same things in my log
(polling/reenabling)

I would like to know if this is a serious problem ?
This log messages seems to be related to this patch, but is it the final
solution ? I have uninstalled my wifi pci card, but this irq problem appears
anyway, why ?

I didn't find any recent news about this problem
(http://www.kernelhub.org/?p=2&msg=14224)
Does someone know if this is a bug in the chip or a bug in the kernel ? 
In both case, do we know if a solution will be possible ?
I don't unerstand all the very specific discussion on this topic...

Maybe I should change my mother board ? I bought it a few days ago. But I am
not really enthusiast with this :)

I imagine it is not easy (Asus might not give all required information) and
maybe users could help with a little pressure ;) Whatever happens, I would like
to thank all kernel developpers and maintener for the great job. 

and sorry for my bad english (but I am French lol)

Comment 33 Josh Boyer 2012-03-15 11:25:58 UTC

Could you try this kernel and see if it functions better for you:

http://koji.fedoraproject.org/koji/buildinfo?buildID=307357

Comment 34 Davoid 2012-03-15 20:18:35 UTC

Tried.
It's worst here. I am spammed with "Disabling IRQ 16" now, more than 2 times per minute.

Mar 15 20:58:57 pulsar kernel: [78682.553094] Disabling IRQ 16
Mar 15 21:02:02 pulsar kernel: [   18.424895] Disabling lock debugging due to kernel taint
Mar 15 21:02:10 pulsar kernel: [   30.024146] Disabling IRQ 16
Mar 15 21:02:32 pulsar kernel: [   50.353518] Disabling IRQ 16
Mar 15 21:02:49 pulsar kernel: [   67.787900] Disabling IRQ 16
Mar 15 21:03:09 pulsar kernel: [   87.310420] Disabling IRQ 16
Mar 15 21:03:15 pulsar kernel: [   93.539826] Disabling IRQ 16
Mar 15 21:03:56 pulsar kernel: [  134.335987] Disabling IRQ 16
Mar 15 21:03:57 pulsar kernel: [  135.681313] Disabling IRQ 16
Mar 15 21:03:59 pulsar kernel: [  137.358162] Disabling IRQ 16
Mar 15 21:04:00 pulsar kernel: [  138.432437] Disabling IRQ 16
Mar 15 21:04:01 pulsar kernel: [  139.636273] Disabling IRQ 16
Mar 15 21:04:14 pulsar kernel: [  152.793792] Disabling IRQ 16
Mar 15 21:04:17 pulsar kernel: [  155.577896] Disabling IRQ 16
Mar 15 21:04:18 pulsar kernel: [  156.706745] Disabling IRQ 16
Mar 15 21:04:20 pulsar kernel: [  158.823092] Disabling IRQ 16
Mar 15 21:04:42 pulsar kernel: [  181.102471] Disabling IRQ 16
Mar 15 21:04:45 pulsar kernel: [  183.478238] Disabling IRQ 16
Mar 15 21:05:25 pulsar kernel: [  223.968318] Disabling IRQ 16
Mar 15 21:06:16 pulsar kernel: [  274.571748] Disabling IRQ 16
Mar 15 21:06:42 pulsar kernel: [  300.462351] Disabling IRQ 16
Mar 15 21:06:56 pulsar kernel: [  314.898798] Disabling IRQ 16
Mar 15 21:06:59 pulsar kernel: [  317.150231] Disabling IRQ 16
Mar 15 21:07:07 pulsar kernel: [  325.163127] Disabling IRQ 16
Mar 15 21:07:12 pulsar kernel: [  330.431224] Disabling IRQ 16
Mar 15 21:07:50 pulsar kernel: [  368.301938] Disabling IRQ 16
Mar 15 21:07:58 pulsar kernel: [  376.880142] Disabling IRQ 16
Mar 15 21:10:00 pulsar kernel: [  498.048971] Disabling IRQ 16
Mar 15 21:10:20 pulsar kernel: [  518.163637] Disabling IRQ 16
Mar 15 21:10:30 pulsar kernel: [  527.913334] Disabling IRQ 16
Mar 15 21:11:34 pulsar kernel: [  592.695034] Disabling IRQ 16
Mar 15 21:12:02 pulsar kernel: [  619.925098] Disabling IRQ 16
Mar 15 21:12:13 pulsar kernel: [  631.051405] Disabling IRQ 16
Mar 15 21:12:17 pulsar kernel: [  635.180600] Disabling IRQ 16
Mar 15 21:12:36 pulsar kernel: [  654.045670] Disabling IRQ 16
Mar 15 21:12:57 pulsar kernel: [  674.964864] Disabling IRQ 16
Mar 15 21:13:02 pulsar kernel: [  680.530997] Disabling IRQ 16
Mar 15 21:13:18 pulsar kernel: [  695.938079] Disabling IRQ 16
Mar 15 21:13:43 pulsar kernel: [  721.401916] Disabling IRQ 16
Mar 15 21:13:48 pulsar kernel: [  726.409863] Disabling IRQ 16
Mar 15 21:13:54 pulsar kernel: [  732.522519] Disabling IRQ 16
Mar 15 21:14:07 pulsar kernel: [  745.176289] Disabling IRQ 16
Mar 15 21:14:12 pulsar kernel: [  750.312007] Disabling IRQ 16
Mar 15 21:14:15 pulsar kernel: [  752.763942] Disabling IRQ 16
Mar 15 21:14:18 pulsar kernel: [  755.924274] Disabling IRQ 16
Mar 15 21:14:20 pulsar kernel: [  757.834736] Disabling IRQ 16
Mar 15 21:15:17 pulsar kernel: [  815.277900] Disabling IRQ 16
Mar 15 21:15:22 pulsar kernel: [  820.473100] Disabling IRQ 16

Comment 35 Josh Boyer 2012-03-15 20:27:50 UTC

(In reply to comment #34)
> Tried.
> It's worst here. I am spammed with "Disabling IRQ 16" now, more than 2 times
> per minute.

Erm... that's confusing.  The kernel I linked to shouldn't disable repeatedly like that...  what does 'uname -a' say?

Anyway, there's a scratch build going here that has another revision:

http://koji.fedoraproject.org/koji/taskinfo?taskID=3898618

testing that would be appreciated.

Comment 36 Davoid 2012-03-15 20:35:32 UTC

I took it from the FC16 update testing ...
uname -a:
Linux pulsar 3.2.10-1.fc16.x86_64 #1 SMP Mon Mar 12 22:34:35 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Comment 37 Davoid 2012-03-15 20:38:05 UTC

oups ok , your link was for 3.2.10-2
I check...

Comment 38 Davoid 2012-03-15 21:23:51 UTC

Tested!
Less messages. But it seems that my screen is becoming laggy just before the "Irq 16 might be stuck" message appears. Then it become better... And then lag again 2 or 3 minutes afer...
So I would say It was better with 3.2.9-2 but more log messages.

Mar 15 22:10:29 pulsar kernel: [   62.896171] IRQ 16 might be stuck.  Polling
Mar 15 22:11:45 pulsar kernel: [  138.405784] IRQ 16 might be stuck.  Polling
Mar 15 22:13:17 pulsar kernel: [  230.227593] IRQ 16 might be stuck.  Polling
Mar 15 22:15:30 pulsar kernel: [  363.701856] IRQ 16 might be stuck.  Polling
Mar 15 22:16:41 pulsar kernel: [  434.726120] IRQ 16 might be stuck.  Polling
Mar 15 22:16:51 pulsar kernel: [  444.816592] IRQ 16 might be stuck.  Polling
Mar 15 22:17:02 pulsar kernel: [  454.907226] IRQ 16 might be stuck.  Polling
Mar 15 22:17:12 pulsar kernel: [  464.997802] IRQ 16 might be stuck.  Polling
Mar 15 22:19:49 pulsar kernel: [  622.362209] IRQ 16 might be stuck.  Polling
Mar 15 22:20:35 pulsar kernel: [  667.997578] IRQ 16 might be stuck.  Polling

Comment 39 Josh Boyer 2012-03-15 23:44:49 UTC

(In reply to comment #38)
> Tested!
> Less messages. But it seems that my screen is becoming laggy just before the
> "Irq 16 might be stuck" message appears. Then it become better... And then lag
> again 2 or 3 minutes afer...

The lag is somewhat to be expected.  The CPU is in polling mode, which means it's eating CPU while looking for interrupts.

If you're willing to test again, I can have another scratch build tomorrow that tries to poll a bit quicker but I'm not sure how much that would improve the lag.

> So I would say It was better with 3.2.9-2 but more log messages.
> 
> Mar 15 22:10:29 pulsar kernel: [   62.896171] IRQ 16 might be stuck.  Polling
> Mar 15 22:11:45 pulsar kernel: [  138.405784] IRQ 16 might be stuck.  Polling
> Mar 15 22:13:17 pulsar kernel: [  230.227593] IRQ 16 might be stuck.  Polling
> Mar 15 22:15:30 pulsar kernel: [  363.701856] IRQ 16 might be stuck.  Polling
> Mar 15 22:16:41 pulsar kernel: [  434.726120] IRQ 16 might be stuck.  Polling
> Mar 15 22:16:51 pulsar kernel: [  444.816592] IRQ 16 might be stuck.  Polling
> Mar 15 22:17:02 pulsar kernel: [  454.907226] IRQ 16 might be stuck.  Polling
> Mar 15 22:17:12 pulsar kernel: [  464.997802] IRQ 16 might be stuck.  Polling
> Mar 15 22:19:49 pulsar kernel: [  622.362209] IRQ 16 might be stuck.  Polling
> Mar 15 22:20:35 pulsar kernel: [  667.997578] IRQ 16 might be stuck.  Polling

Are these being printed to the console, or do you need to run 'dmesg' to see them?  If they're still going to the console I'll reduce the severity again.

Comment 40 Davoid 2012-03-16 20:33:12 UTC

I'm looking this messages from /var/log/messages (I made a grep to only show the stuck message)

I am about to try 3.2.10-2.2

Comment 41 Davoid 2012-03-16 21:24:10 UTC

ok, I am running 3.2.10-2.2 since 30mn, no IRQ messages at all for the moment! 
I didn't see any lag.
So it's better than the 3.2.10-1

I'll continue and see tomorrow.

Comment 42 Davoid 2012-03-17 13:06:24 UTC

hi,

no IRQ message since last reboot with 3.2.10-2.2 in the link you gave.
uptime is 15h.

Comment 43 Davoid 2012-03-19 15:19:40 UTC

I have 3.2.10-3.fc16 available  with yum in the updates repo.
Do we find your last change in it ?

Comment 44 Josh Boyer 2012-03-20 01:07:42 UTC

*** Bug 804725 has been marked as a duplicate of this bug. ***

Comment 45 Josh Boyer 2012-03-20 01:09:57 UTC

(In reply to comment #43)
> I have 3.2.10-3.fc16 available  with yum in the updates repo.
> Do we find your last change in it ?

No, not really.  3.2.10-3 has the patch dropped entirely as it was broken for a number of users.  The next submitted update should have it included.

Comment 46 zombu2 2012-03-20 01:28:53 UTC

(In reply to comment #45)
> (In reply to comment #43)
> > I have 3.2.10-3.fc16 available  with yum in the updates repo.
> > Do we find your last change in it ?
> 
> No, not really.  3.2.10-3 has the patch dropped entirely as it was broken for a
> number of users.  The next submitted update should have it included.

how much broker can it get??

seems that 3.2.10-3 also broke atheros nic for me since it works on install with 3.0.1.7 and as soon as update runs it s gone

but on the bright side the irq errors went away with 3.2.10-3

Comment 47 Dave Jones 2012-03-22 16:38:52 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 48 Dave Jones 2012-03-22 16:43:57 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 49 Dave Jones 2012-03-22 16:52:22 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 50 Davoid 2012-03-26 09:56:45 UTC

I updated with this kernel-3.3.0-4.fc16
All seem ok, I don't see any IRQ log message...

do we need to keep the irqpoll option ?

Comment 51 Torgeir Veimo 2012-04-06 16:03:44 UTC

Any idea when this patch will go into mainline kernel source?

Comment 52 Josh Boyer 2012-04-09 13:11:51 UTC

(In reply to comment #51)
> Any idea when this patch will go into mainline kernel source?

It probably won't be.  At least not the current version of it.  We're still deciding if it's worth carrying.

Comment 53 Torgeir Veimo 2012-04-10 13:07:38 UTC

Are there any other approaches elsewhere to workaround the issues with the ASM108x bridge? From the postings on LKML it appears that Linus is interested in having a better workaround.

Comment 54 Josh Boyer 2012-04-23 14:14:01 UTC

*** Bug 815119 has been marked as a duplicate of this bug. ***

Comment 55 Dave Jones 2012-07-12 15:45:43 UTC

If you can still reproduce this in 3.4, please reopen. We believe this should be fixed with the current updates.

Comment 56 Josh Boyer 2012-07-12 18:16:54 UTC

*** Bug 839733 has been marked as a duplicate of this bug. ***

Comment 57 Davoid 2012-07-13 19:09:46 UTC

Problem is back in the last update 3.4.4-4
I updated yesterday and today many messages:
"IRQ 16 might be stuck.  Polling"

Comment 58 Kayvan Amirahmadi 2012-07-14 03:45:59 UTC

The problem came back again and the ping times from other internal networks is different from the other on each polling.

#######

[root@localhost:/var/log] $ tail -f messages
Jul 14 08:04:25 localhost kernel: [432091.477327] IRQ 16 might be stuck.  Polling
Jul 14 08:04:36 localhost kernel: [432102.159299] IRQ 16 might be stuck.  Polling
Jul 14 08:04:48 localhost kernel: [432114.489476] IRQ 16 might be stuck.  Polling
Jul 14 08:05:06 localhost kernel: [432132.336522] IRQ 16 might be stuck.  Polling
Jul 14 08:05:24 localhost kernel: [432150.356137] IRQ 16 might be stuck.  Polling
Jul 14 08:05:35 localhost kernel: [432161.292643] IRQ 16 might be stuck.  Polling
Jul 14 08:05:46 localhost kernel: [432172.527401] IRQ 16 might be stuck.  Polling

###############

C:\Program Files\Support Tools>ping 192.168.0.110 -t

Pinging 192.168.0.110 with 32 bytes of data:

Reply from 192.168.0.110: bytes=32 time=23ms TTL=64
Reply from 192.168.0.110: bytes=32 time=30ms TTL=64
Reply from 192.168.0.110: bytes=32 time=32ms TTL=64
Reply from 192.168.0.110: bytes=32 time=32ms TTL=64
Reply from 192.168.0.110: bytes=32 time=32ms TTL=64
Reply from 192.168.0.110: bytes=32 time=32ms TTL=64
Reply from 192.168.0.110: bytes=32 time<1ms TTL=64
Reply from 192.168.0.110: bytes=32 time<1ms TTL=64
Reply from 192.168.0.110: bytes=32 time<1ms TTL=64
Reply from 192.168.0.110: bytes=32 time<1ms TTL=64
Reply from 192.168.0.110: bytes=32 time<1ms TTL=64
Reply from 192.168.0.110: bytes=32 time<1ms TTL=64
Reply from 192.168.0.110: bytes=32 time<1ms TTL=64
Reply from 192.168.0.110: bytes=32 time<1ms TTL=64
Reply from 192.168.0.110: bytes=32 time=59ms TTL=64
Reply from 192.168.0.110: bytes=32 time=59ms TTL=64
Reply from 192.168.0.110: bytes=32 time=59ms TTL=64
Reply from 192.168.0.110: bytes=32 time=59ms TTL=64
Reply from 192.168.0.110: bytes=32 time=59ms TTL=64
Reply from 192.168.0.110: bytes=32 time=59ms TTL=64
Reply from 192.168.0.110: bytes=32 time=59ms TTL=64
Reply from 192.168.0.110: bytes=32 time=59ms TTL=64
Reply from 192.168.0.110: bytes=32 time=59ms TTL=64
Reply from 192.168.0.110: bytes=32 time=59ms TTL=64
Reply from 192.168.0.110: bytes=32 time<1ms TTL=64
Reply from 192.168.0.110: bytes=32 time=3ms TTL=64
Reply from 192.168.0.110: bytes=32 time=3ms TTL=64
Reply from 192.168.0.110: bytes=32 time=4ms TTL=64

Comment 59 Dave Jones 2012-07-16 15:51:04 UTC

I think that particular case is about as good as it's going to get.

The hardware doesn't behave correctly, and the vendor is uncooperative in telling us how to work around it, so this is the best we can do.

Comment 60 Davoid 2012-07-17 09:01:16 UTC

So what can we do ? 
We have to change our motherboard ? :(

Comment 61 Valeriy Chkalov 2012-07-17 11:01:10 UTC

I have the same problem.
last updated 5 minutes ago.

# uname -a
Linux linuxlocal 3.4.4-5.fc17.x86_64 #1 SMP Thu Jul 5 20:20:59 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux


# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.4.4-5.fc17.x86_64 root=UUID=1d687255-fffd-46d0-b774-f6852fd48c9e ro quiet rhgb nouveau.modeset=0 rd.driver.blacklist=nouveau SYSFONT=False LANG=ru_RU.UTF-8 KEYTABLE=ru irqpoll


# dmesg
[  314.460745] IRQ 16 might be stuck.  Polling
[  795.077515] IRQ 16 might be stuck.  Polling
[  827.721361] IRQ 16 might be stuck.  Polling
[  852.817196] IRQ 16 might be stuck.  Polling
[  975.831648] IRQ 16 might be stuck.  Polling
[ 1106.381880] IRQ 16 might be stuck.  Polling
[ 1244.393381] IRQ 16 might be stuck.  Polling
[ 1279.808228] IRQ 16 might be stuck.  Polling

Comment 62 Davoid 2012-07-18 11:40:15 UTC

New asus P8Z77-V Pro seem to have the ASMedia ASM1083 bridge chip too...
What can we do ? No more PCI ?

Comment 63 freemarket 2012-11-30 05:04:04 UTC

Folks,
I reproduced this error but inadvertently. I stuck a 32Gb USB 2.0 SanDisk Cruzer flash drive into my desktop (Shuttle XPC XS58H7 PRO) and instantly received this IRQ #16 error on all my terminal windows.

# uname -a
Linux zurich.homelinux.org 3.6.7-4.fc16.x86_64 #1 SMP Tue Nov 20 20:33:31 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
# BOOT_IMAGE=/boot/vmlinuz-3.6.7-4.fc16.x86_64 root=/dev/mapper/vg_zurich-lv_root ro rd.lvm.lv=vg_zurich/lv_swap rd.md=0 rd.dm=0 KEYTABLE=us rd.lvm.lv=vg_zurich/lv_root quiet SYSFONT=latarcyrheb-sun16 rhgb rd.luks=0 LANG=en_US.UTF-8 pci=nomsi
# less /var/log/messages
Nov 29 23:34:15 zurich kernel: [50576.198354] irq 16: nobody cared (try booting with the "irqpoll" option)
Nov 29 23:34:15 zurich kernel: [50576.198358] Pid: 0, comm: swapper/0 Tainted: P           O 3.6.7-4.fc16.x86_64 #1
Nov 29 23:34:15 zurich kernel: [50576.198359] Call Trace:
Nov 29 23:34:15 zurich kernel: [50576.198360]  <IRQ>  [<ffffffff810eb09d>] __report_bad_irq+0x3d/0xe0
Nov 29 23:34:15 zurich kernel: [50576.198368]  [<ffffffff810eb355>] note_interrupt+0x165/0x220
Nov 29 23:34:15 zurich kernel: [50576.198370]  [<ffffffff810e8b29>] handle_irq_event_percpu+0xa9/0x210
Nov 29 23:34:15 zurich kernel: [50576.198373]  [<ffffffff8101baf9>] ? sched_clock+0x9/0x10
Nov 29 23:34:15 zurich kernel: [50576.198374]  [<ffffffff810e8cd2>] handle_irq_event+0x42/0x70
Nov 29 23:34:15 zurich kernel: [50576.198376]  [<ffffffff810ebed9>] handle_fasteoi_irq+0x59/0x100
Nov 29 23:34:15 zurich kernel: [50576.198379]  [<ffffffff81016150>] handle_irq+0x60/0x150
Nov 29 23:34:15 zurich kernel: [50576.198382]  [<ffffffff810656d4>] ? irq_enter+0x54/0x90
Nov 29 23:34:15 zurich kernel: [50576.198384]  [<ffffffff816236ca>] do_IRQ+0x5a/0xe0
Nov 29 23:34:15 zurich kernel: [50576.198387]  [<ffffffff81619eea>] common_interrupt+0x6a/0x6a
Nov 29 23:34:15 zurich kernel: [50576.198388]  <EOI>  [<ffffffff814c6ad9>] ? poll_idle+0x49/0x90
Nov 29 23:34:15 zurich kernel: [50576.198392]  [<ffffffff814c6aac>] ? poll_idle+0x1c/0x90
Nov 29 23:34:15 zurich kernel: [50576.198394]  [<ffffffff814c6669>] cpuidle_enter+0x19/0x20
Nov 29 23:34:15 zurich kernel: [50576.198396]  [<ffffffff814c6cfc>] cpuidle_idle_call+0xac/0x290
Nov 29 23:34:15 zurich kernel: [50576.198398]  [<ffffffff8101d74f>] cpu_idle+0xcf/0x120
Nov 29 23:34:15 zurich kernel: [50576.198400]  [<ffffffff815f611e>] rest_init+0x72/0x74
Nov 29 23:34:15 zurich kernel: [50576.198404]  [<ffffffff81cfcc31>] start_kernel+0x3c7/0x3d4
Nov 29 23:34:15 zurich kernel: [50576.198406]  [<ffffffff81cfc66a>] ? repair_env_string+0x5a/0x5a
Nov 29 23:34:15 zurich kernel: [50576.198407]  [<ffffffff81cfc356>] x86_64_start_reservations+0x131/0x135
Nov 29 23:34:15 zurich kernel: [50576.198409]  [<ffffffff81cfc120>] ? early_idt_handlers+0x120/0x120
Nov 29 23:34:15 zurich kernel: [50576.198411]  [<ffffffff81cfc45c>] x86_64_start_kernel+0x102/0x111

and the interesting thing was that the nvidia driver happens to share the interrupt with a usb:

# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7       
  0:     345880          0          0          0          0          0          0          0   IO-APIC-edge      timer
  1:          2          0          0          0          0          0          0          0   IO-APIC-edge      i8042
  8:          1          0          0          0          0          0          0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   xhci_hcd:usb9
 16:      45593          0          0          0          0          0          0          0   IO-APIC-fasteoi   ahci, uhci_hcd:usb3, nvidia
 17:     546704          0          0          0          0          0          0          0   IO-APIC-fasteoi   snd_hda_intel, eth0
 18:          3          0          0          0          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb1, uhci_hcd:usb8, i801_smbus, eth1
 19:      23052          0          0          0          0          0          0          0   IO-APIC-fasteoi   ahci, uhci_hcd:usb5, uhci_hcd:usb7
 21:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   uhci_hcd:usb4
 22:        626          0          0          0          0          0          0          0   IO-APIC-fasteoi   snd_hda_intel
 23:          0          0          0          0          0          0          0          0   IO-APIC-fasteoi   ehci_hcd:usb2, uhci_hcd:usb6
NMI:        174        123        111        116         25         21         22         21   Non-maskable interrupts
LOC:     160158     194809     173356     174282      59352      54094      53428      48579   Local timer interrupts
SPU:          0          0          0          0          0          0          0          0   Spurious interrupts
PMI:        174        123        111        116         25         21         22         21   Performance monitoring interrupts
IWI:          0          0          0          0          0          0          0          0   IRQ work interrupts
RTR:          5          0          0          0          0          0          0          0   APIC ICR read retries
RES:     148485      68071      14284       5637       2939       1589       1274       1324   Rescheduling interrupts
CAL:      16316      13558       9978      10295       7119       6619       6255       4802   Function call interrupts
TLB:          0          0          0          0          0          0          0          0   TLB shootdowns
TRM:          0          0          0          0          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0          0          0          0          0   Threshold APIC interrupts
MCE:          0          0          0          0          0          0          0          0   Machine check exceptions
MCP:          7          7          7          7          7          7          7          7   Machine check polls
ERR:          0
MIS:          0

I had to reboot as the only observable consequence of this event was that the mouse was limping.