my colleague is using the 100% identical hardware and has reported multiple times this issue and since i never faced it we assumed it's luck by having the WLAN card in a different slot well, short ago with 3.11.2-200.fc19.x86_64 i had it the first time because the other person has this problem more often over months it is very unlikely the new kernel-build - unsure what triggers this because i did not much special and had no load nor the WLAN used at the moment [ 8254.358785] irq 16: nobody cared (try booting with the "irqpoll" option) [ 8254.358789] CPU: 7 PID: 0 Comm: swapper/7 Tainted: GF O 3.11.2-200.fc19.x86_64 #1 [ 8254.358790] Hardware name: Hewlett-Packard HP Compaq Elite 8300 CMT/3396, BIOS K01 v02.57 11/16/2012 [ 8254.358791] ffff88040801298c ffff88041ebc3e50 ffffffff816476ef ffff880408012900 [ 8254.358793] ffff88041ebc3e78 ffffffff810f80c2 ffff880408012900 0000000000000010 [ 8254.358794] 0000000000000000 ffff88041ebc3eb8 ffffffff810f84d8 ffffffff81500292 [ 8254.358795] Call Trace: [ 8254.358796] <IRQ> [<ffffffff816476ef>] dump_stack+0x45/0x56 [ 8254.358804] [<ffffffff810f80c2>] __report_bad_irq+0x32/0xd0 [ 8254.358805] [<ffffffff810f84d8>] note_interrupt+0x138/0x1f0 [ 8254.358808] [<ffffffff81500292>] ? cpuidle_enter_state+0x52/0xc0 [ 8254.358809] [<ffffffff810f5ee1>] handle_irq_event_percpu+0xe1/0x1e0 [ 8254.358811] [<ffffffff810f6016>] handle_irq_event+0x36/0x60 [ 8254.358812] [<ffffffff810f9015>] handle_fasteoi_irq+0x55/0xf0 [ 8254.358815] [<ffffffff8101459f>] handle_irq+0xbf/0x150 [ 8254.358816] [<ffffffff8165220a>] ? atomic_notifier_call_chain+0x1a/0x20 [ 8254.358819] [<ffffffff81658a4d>] do_IRQ+0x4d/0xc0 [ 8254.358820] [<ffffffff8164e3ed>] common_interrupt+0x6d/0x6d [ 8254.358821] <EOI> [<ffffffff81500292>] ? cpuidle_enter_state+0x52/0xc0 [ 8254.358823] [<ffffffff815003c9>] cpuidle_idle_call+0xc9/0x210 [ 8254.358825] [<ffffffff8101b5fe>] arch_cpu_idle+0xe/0x30 [ 8254.358827] [<ffffffff810b66ae>] cpu_startup_entry+0xce/0x280 [ 8254.358829] [<ffffffff8103ed77>] start_secondary+0x217/0x2c0 [ 8254.358830] handlers: [ 8254.358832] [<ffffffff81469e90>] usb_hcd_irq [ 8254.358838] [<ffffffffa018e640>] ath_isr [ath9k] [ 8254.358839] Disabling IRQ #16 ______________________________________________ [root@srv-rhsoft:~]$ lspci 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller (rev 09) 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor Graphics Controller (rev 09) 00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04) 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04) 00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04) 00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4) 00:1c.4 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 5 (rev c4) 00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a4) 00:1f.0 ISA bridge: Intel Corporation Q77 Express Chipset LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04) 00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04) 01:00.0 Network controller: Qualcomm Atheros AR5418 Wireless Network Adapter [AR5008E 802.11(a)bgn] (PCI-Express) (rev 01) 02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection ______________________________________________ [root@srv-rhsoft:~]$ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 29 0 0 0 0 0 0 0 IO-APIC-edge timer 1: 3 0 0 0 0 0 0 0 IO-APIC-edge i8042 4: 0 0 0 0 2 0 0 0 IO-APIC-edge 8: 1 0 0 0 0 0 0 0 IO-APIC-edge rtc0 9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi 12: 4 0 0 0 0 0 0 0 IO-APIC-edge i8042 16: 5358 2547 1504 3127 24399 24620 17373 31193 IO-APIC-fasteoi ehci_hcd:usb1, ath9k 23: 106 266 95 183 471 2456 1353 3010 IO-APIC-fasteoi ehci_hcd:usb2 40: 11033 3846 8593 1285 25317 10203 17852 4605 PCI-MSI-edge ahci 41: 0 0 0 0 0 1 0 0 PCI-MSI-edge xhci_hcd 42: 37913 3392 3034 2220 21068 3995 4589 2807 PCI-MSI-edge i915 43: 42 24 35 2 82 100 55 145 PCI-MSI-edge eth0 44: 38 122 19 0 302 57 13 107 PCI-MSI-edge snd_hda_intel 45: 17 1736 2279 554 4069 36987 6979 924 PCI-MSI-edge eth1-rx-0 46: 6 197 308 8160 4888 1013 658 3510 PCI-MSI-edge eth1-tx-0 47: 0 0 0 0 1 0 1 0 PCI-MSI-edge eth1 NMI: 0 0 0 0 0 0 0 0 Non-maskable interrupts LOC: 111363 95999 96334 98733 40733 44101 45915 54367 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 0 0 0 0 Performance monitoring interrupts IWI: 5387 3250 2150 2076 1663 1687 2401 1568 IRQ work interrupts RTR: 5 0 0 0 0 0 0 0 APIC ICR read retries RES: 127300 117662 109438 114419 50639 54077 56062 50163 Rescheduling interrupts CAL: 1140 886 945 999 758 904 910 836 Function call interrupts TLB: 303 603 353 335 697 734 398 544 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 0 0 0 0 Machine check exceptions MCP: 3 3 3 3 3 3 3 3 Machine check polls ERR: 0 MIS: 0
sorry - i missed the result of this problem: while all works more or less the desktop get unusable slow, the mousepointer went lazy and i guess only by the power of the machine it was possible to save all things and reboot more or less smooth
OK, now i say 3.11.2 makes things worser, the seond time i see this problem here while my colleague has it regulary on F18 over months _____________________________________ i guess the bugfixes for ath9 making things worser https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.11.2 [harry@srv-rhsoft:~/Desktop]$ cat ChangeLog-3.11.2 | grep ath9 ath9k: avoid accessing MRC registers on single-chain devices ath9k: fix rx descriptor related race condition ath9k: always clear ps filter bit on new assoc _____________________________________ [65998.741193] irq 16: nobody cared (try booting with the "irqpoll" option) [65998.741205] CPU: 3 PID: 0 Comm: swapper/3 Tainted: GF O 3.11.2-200.fc19.x86_64 #1 [65998.741206] Hardware name: Hewlett-Packard HP Compaq Elite 8300 CMT/3396, BIOS K01 v02.57 11/16/2012 [65998.741207] ffff88040801298c ffff88041eac3e50 ffffffff816476ef ffff880408012900 [65998.741208] ffff88041eac3e78 ffffffff810f80c2 ffff880408012900 0000000000000010 [65998.741210] 0000000000000000 ffff88041eac3eb8 ffffffff810f84d8 ffffffff81500292 [65998.741211] Call Trace: [65998.741212] <IRQ> [<ffffffff816476ef>] dump_stack+0x45/0x56 [65998.741220] [<ffffffff810f80c2>] __report_bad_irq+0x32/0xd0 [65998.741221] [<ffffffff810f84d8>] note_interrupt+0x138/0x1f0 [65998.741223] [<ffffffff81500292>] ? cpuidle_enter_state+0x52/0xc0 [65998.741225] [<ffffffff810f5ee1>] handle_irq_event_percpu+0xe1/0x1e0 [65998.741226] [<ffffffff810f6016>] handle_irq_event+0x36/0x60 [65998.741228] [<ffffffff810f9015>] handle_fasteoi_irq+0x55/0xf0 [65998.741230] [<ffffffff8101459f>] handle_irq+0xbf/0x150 [65998.741232] [<ffffffff8165220a>] ? atomic_notifier_call_chain+0x1a/0x20 [65998.741235] [<ffffffff81658a4d>] do_IRQ+0x4d/0xc0 [65998.741236] [<ffffffff8164e3ed>] common_interrupt+0x6d/0x6d [65998.741237] <EOI> [<ffffffff81500292>] ? cpuidle_enter_state+0x52/0xc0 [65998.741239] [<ffffffff815003c9>] cpuidle_idle_call+0xc9/0x210 [65998.741241] [<ffffffff8101b5fe>] arch_cpu_idle+0xe/0x30 [65998.741243] [<ffffffff810b66ae>] cpu_startup_entry+0xce/0x280 [65998.741245] [<ffffffff8103ed77>] start_secondary+0x217/0x2c0 [65998.741246] handlers: [65998.741248] [<ffffffff81469e90>] usb_hcd_irq [65998.741254] [<ffffffffa0436640>] ath_isr [ath9k] [65998.741255] Disabling IRQ #16
interesting look at the time of my initial report - exactly 24 hours
Please try and recreate this without loading whatever out-of-tree modules you have loaded.
sorry, i can't shutdown VMware Workstation on this machine, it's hosting all internal services, build-environments, my other machine has no WLAN card and that happened exactly 2 times until now maybe the changes from 3.11.2-201.fc19.x86_64 are fixing it for now but given that my colleague has the problem on F18 with ident hardware randomly over months and the amount of ath9k in the kernel-changelogs over months there is something wrong not related to the VMware modules
Hi, also happens w/ Debian system with aptosid kernel 3.12-5.slh Happened since I installed chrome and use heavy Flash apps. Dec 22 13:43:22 osiris kernel: [ 321.852928] irq 16: nobody cared (try booting with the "irqpoll" option) Dec 22 13:43:22 osiris kernel: [ 321.852937] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P O 3.12-0.slh.2-aptosid-amd64 #1 Dec 22 13:43:22 osiris kernel: [ 321.852940] Hardware name: System manufacturer Maximus II Formula/Maximus II Formula, BIOS 2302 04/1 5/2010 Dec 22 13:43:22 osiris kernel: [ 321.852942] 0000000000000006 ffffffff813722d2 ffff88022439d600 ffffffff8107647a Dec 22 13:43:22 osiris kernel: [ 321.852946] ffff88022439d600 0000000000000000 0000000000000010 ffffffff81076804 Dec 22 13:43:22 osiris kernel: [ 321.852950] 0000000000000000 ffff88022439d600 0000000000000010 0000000000000000 Dec 22 13:43:22 osiris kernel: [ 321.852954] Call Trace: Dec 22 13:43:22 osiris kernel: [ 321.852957] <IRQ> [<ffffffff813722d2>] ? dump_stack+0x50/0x89 Dec 22 13:43:22 osiris kernel: [ 321.852968] [<ffffffff8107647a>] ? __report_bad_irq+0x2c/0xb4 Dec 22 13:43:22 osiris kernel: [ 321.852971] [<ffffffff81076804>] ? note_interrupt+0x145/0x1c5 Dec 22 13:43:22 osiris kernel: [ 321.852976] [<ffffffff81074c4c>] ? handle_irq_event_percpu+0x104/0x112 Dec 22 13:43:22 osiris kernel: [ 321.852980] [<ffffffff81074c8e>] ? handle_irq_event+0x34/0x51 Dec 22 13:43:22 osiris kernel: [ 321.852984] [<ffffffff81077019>] ? handle_fasteoi_irq+0x75/0xa6 Dec 22 13:43:22 osiris kernel: [ 321.852988] [<ffffffff8100bf90>] ? handle_irq+0x15/0x1d Dec 22 13:43:22 osiris kernel: [ 321.852992] [<ffffffff8100bc5e>] ? do_IRQ+0x40/0x95 Dec 22 13:43:22 osiris kernel: [ 321.852996] [<ffffffff81376bed>] ? common_interrupt+0x6d/0x6d Dec 22 13:43:22 osiris kernel: [ 321.852998] <EOI> [<ffffffff8128ea01>] ? arch_local_irq_enable+0x4/0x8 Dec 22 13:43:22 osiris kernel: [ 321.853007] [<ffffffff8128ecd1>] ? cpuidle_enter_state+0x50/0xa9 Dec 22 13:43:22 osiris kernel: [ 321.853019] [<ffffffff8128edf9>] ? cpuidle_idle_call+0xcf/0x119 Dec 22 13:43:22 osiris kernel: [ 321.853023] [<ffffffff81011c87>] ? arch_cpu_idle+0x5/0x17 Dec 22 13:43:22 osiris kernel: [ 321.853027] [<ffffffff8107448a>] ? cpu_startup_entry+0xed/0x146 Dec 22 13:43:22 osiris kernel: [ 321.853031] [<ffffffff8102b54d>] ? start_secondary+0x1ed/0x1f0 Dec 22 13:43:22 osiris kernel: [ 321.853033] handlers: Dec 22 13:43:22 osiris kernel: [ 321.853047] [<ffffffffa0009f87>] usb_hcd_irq [usbcore] Dec 22 13:43:22 osiris kernel: [ 321.853056] [<ffffffffa00ca3d7>] ata_bmdma_interrupt [libata] Dec 22 13:43:22 osiris kernel: [ 321.853142] [<ffffffffa052c21c>] nv_kern_isr [nvidia] Dec 22 13:43:22 osiris kernel: [ 321.853144] Disabling IRQ #16 .. upadted kernel, with IRQPOLL: Dec 25 13:03:34 osiris kernel: [ 329.462408] irq 16: nobody cared (try booting with the "irqpoll" option) Dec 25 13:03:34 osiris kernel: [ 329.462415] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P O 3.12-5.slh.2-aptosid-amd64 #1 Dec 25 13:03:34 osiris kernel: [ 329.462418] Hardware name: System manufacturer Maximus II Formula/Maximus II Formula, BIOS 2302 04/1 5/2010 Dec 25 13:03:34 osiris kernel: [ 329.462420] 0000000000000006 ffffffff813728c6 ffff88022439d800 ffffffff81076551 Dec 25 13:03:34 osiris kernel: [ 329.462425] ffff88022439d800 0000000000000000 00000000000002c8 ffffffff810768db Dec 25 13:03:34 osiris kernel: [ 329.462429] 0000000000000000 ffff88022439d800 0000000000000010 0000000000000000 Dec 25 13:03:34 osiris kernel: [ 329.462433] Call Trace: Dec 25 13:03:34 osiris kernel: [ 329.462435] <IRQ> [<ffffffff813728c6>] ? dump_stack+0x50/0x89 Dec 25 13:03:34 osiris kernel: [ 329.462447] [<ffffffff81076551>] ? __report_bad_irq+0x2c/0xb4 Dec 25 13:03:34 osiris kernel: [ 329.462451] [<ffffffff810768db>] ? note_interrupt+0x145/0x1c5 Dec 25 13:03:34 osiris kernel: [ 329.462456] [<ffffffff81074d23>] ? handle_irq_event_percpu+0x104/0x112 Dec 25 13:03:34 osiris kernel: [ 329.462460] [<ffffffff81074d65>] ? handle_irq_event+0x34/0x51 Dec 25 13:03:34 osiris kernel: [ 329.462464] [<ffffffff810770f0>] ? handle_fasteoi_irq+0x75/0xa6 Dec 25 13:03:34 osiris kernel: [ 329.462468] [<ffffffff8100bf90>] ? handle_irq+0x15/0x1d Dec 25 13:03:34 osiris kernel: [ 329.462472] [<ffffffff8100bc5e>] ? do_IRQ+0x40/0x95 Dec 25 13:03:34 osiris kernel: [ 329.462476] [<ffffffff813771ad>] ? common_interrupt+0x6d/0x6d Dec 25 13:03:34 osiris kernel: [ 329.462478] <EOI> [<ffffffff8128ed26>] ? arch_local_irq_enable+0x4/0x8 Dec 25 13:03:34 osiris kernel: [ 329.462486] [<ffffffff8128eff6>] ? cpuidle_enter_state+0x50/0xa9 Dec 25 13:03:34 osiris kernel: [ 329.462500] [<ffffffff8128f11e>] ? cpuidle_idle_call+0xcf/0x119 Dec 25 13:03:34 osiris kernel: [ 329.462505] [<ffffffff81011c91>] ? arch_cpu_idle+0x5/0x17 Dec 25 13:03:34 osiris kernel: [ 329.462508] [<ffffffff8107455f>] ? cpu_startup_entry+0x109/0x164 Dec 25 13:03:34 osiris kernel: [ 329.462512] [<ffffffff8102b54d>] ? start_secondary+0x1ed/0x1f0 Dec 25 13:03:34 osiris kernel: [ 329.462515] handlers: Dec 25 13:03:34 osiris kernel: [ 329.462528] [<ffffffffa000a013>] usb_hcd_irq [usbcore] Dec 25 13:03:34 osiris kernel: [ 329.462537] [<ffffffffa014d3d4>] ata_bmdma_interrupt [libata] Dec 25 13:03:34 osiris kernel: [ 329.462623] [<ffffffffa062c21e>] nv_kern_isr [nvidia] Dec 25 13:03:34 osiris kernel: [ 329.462625] Disabling IRQ #16 IRQ 16: $ grep '16:' /proc/interrupts 16: 9379585 9251522 252 199 IO-APIC-fasteoi uhci_hcd:usb2, pata_marvell, nvidia
well, that means we have different systems, different hardware and all the time IRQ16 is involved - looks like a deeper kernel problem, i am glad that it happened to me only twice, but as said my co-developer has 100% identical hardware as mine and it happens way too often to him he has the same machine at office without the WLAN card, it did not happen there once, so i suspect the more PCI/PCI-X cards the machine has the more likely it get triggered
ps: More info: new is: I use eSATA via Marvell. I moved the nVidia to the other slot.. anyone an idea why it sticks on IRQ16?
https://www.google.at/search?q=+irq+16%3A+nobody+cared * Fedora * CentOS * Debian * Arch Linux * SuSE ................
Have you tried booting with IRQPoll as the trace suggests? Any relieve with that?
i only faced this issue exactly 2 times, so for me it is hard to nail down, i only know that others are again and again affected and opened that bugreport after the first time my machine did go down one of the other heavier affected users statet that IRQPoll does not help really interesting is the large count of google hits
Hi, IRQPoll did not fix it, however I think I found the reason. MoBo Maximus II Formula IRQ16 in use by: nVidia,.. and Marvell. pata_marvell is also responsible for eSATA. After switching from eSATA to a (new USB3-adapter, no more lost IRQs. I do assume either the card or the Marvell driver have a bug and the kernel disables unhandled IRQs.
(In reply to wheiss from comment #12) > Hi, > IRQPoll did not fix it, however I think I found the reason. > MoBo Maximus II Formula > IRQ16 in use by: nVidia,.. and Marvell. pata_marvell is also responsible for > eSATA. > After switching from eSATA to a (new USB3-adapter, no more lost IRQs. > I do assume either the card or the Marvell driver have a bug and the kernel > disables unhandled IRQs. Yes, that might do it. Unless they both fully support shared IRQs, chaos ensues. Even then, my experience has been to avoid it if at all possible, latency problems, etc. Harald, does this help you as well?
my colleague don't want to replace his WLAN-AP with USB3 :-) we both compared our machines mutliple times they are 100% identical including the IRQ sharing no idea why it affected my only twice in 2 years and him much more often more funny that the two times it happened for me the second one was exactly 24 hours after the first https://bugzilla.redhat.com/show_bug.cgi?id=1013054#c3 i really don't get it :-(
(In reply to Harald Reindl from comment #14) > my colleague don't want to replace his WLAN-AP with USB3 :-) > > we both compared our machines mutliple times > they are 100% identical including the IRQ sharing > no idea why it affected my only twice in 2 years and him much more often > > more funny that the two times it happened for me the second one was exactly > 24 hours after the first > https://bugzilla.redhat.com/show_bug.cgi?id=1013054#c3 > > i really don't get it :-( Perhaps the applications you run differ in what he runs, exposing the problem with different frequency..video demand, disk access on eSata. It may take a specific sequence to expose the timing window.. Do you both run similar applications load?
his workload is more eclipse and mine more VMware machines with a lot of IO my machine exists much longer and because the hostapd WLAN-AP worked that fine we ordered exactly the same hardware again with nearly identical config (LAN/WAN/Bridges/Routing/VPN) i fear there is not much more to debug given that it takes sometimes 8 days and sometimes happens daily for him :-(
Nor do I at this point..at least not from from me. Perhaps bad or just different hardware. Closing this, please feel free to reopen if you get some more information.