Bug 1013054
Summary: | irq 16: nobody cared | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Harald Reindl <h.reindl> |
Component: | kernel | Assignee: | fedora-kernel-wireless-ath |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 19 | CC: | gansalmon, h.reindl, itamar, jogreene, jonathan, kernel-maint, madhu.chinakonda, marcelo.barbosa, wheiss |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-02-04 19:10:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Harald Reindl
2013-09-27 17:24:36 UTC
sorry - i missed the result of this problem: while all works more or less the desktop get unusable slow, the mousepointer went lazy and i guess only by the power of the machine it was possible to save all things and reboot more or less smooth OK, now i say 3.11.2 makes things worser, the seond time i see this problem here while my colleague has it regulary on F18 over months _____________________________________ i guess the bugfixes for ath9 making things worser https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.11.2 [harry@srv-rhsoft:~/Desktop]$ cat ChangeLog-3.11.2 | grep ath9 ath9k: avoid accessing MRC registers on single-chain devices ath9k: fix rx descriptor related race condition ath9k: always clear ps filter bit on new assoc _____________________________________ [65998.741193] irq 16: nobody cared (try booting with the "irqpoll" option) [65998.741205] CPU: 3 PID: 0 Comm: swapper/3 Tainted: GF O 3.11.2-200.fc19.x86_64 #1 [65998.741206] Hardware name: Hewlett-Packard HP Compaq Elite 8300 CMT/3396, BIOS K01 v02.57 11/16/2012 [65998.741207] ffff88040801298c ffff88041eac3e50 ffffffff816476ef ffff880408012900 [65998.741208] ffff88041eac3e78 ffffffff810f80c2 ffff880408012900 0000000000000010 [65998.741210] 0000000000000000 ffff88041eac3eb8 ffffffff810f84d8 ffffffff81500292 [65998.741211] Call Trace: [65998.741212] <IRQ> [<ffffffff816476ef>] dump_stack+0x45/0x56 [65998.741220] [<ffffffff810f80c2>] __report_bad_irq+0x32/0xd0 [65998.741221] [<ffffffff810f84d8>] note_interrupt+0x138/0x1f0 [65998.741223] [<ffffffff81500292>] ? cpuidle_enter_state+0x52/0xc0 [65998.741225] [<ffffffff810f5ee1>] handle_irq_event_percpu+0xe1/0x1e0 [65998.741226] [<ffffffff810f6016>] handle_irq_event+0x36/0x60 [65998.741228] [<ffffffff810f9015>] handle_fasteoi_irq+0x55/0xf0 [65998.741230] [<ffffffff8101459f>] handle_irq+0xbf/0x150 [65998.741232] [<ffffffff8165220a>] ? atomic_notifier_call_chain+0x1a/0x20 [65998.741235] [<ffffffff81658a4d>] do_IRQ+0x4d/0xc0 [65998.741236] [<ffffffff8164e3ed>] common_interrupt+0x6d/0x6d [65998.741237] <EOI> [<ffffffff81500292>] ? cpuidle_enter_state+0x52/0xc0 [65998.741239] [<ffffffff815003c9>] cpuidle_idle_call+0xc9/0x210 [65998.741241] [<ffffffff8101b5fe>] arch_cpu_idle+0xe/0x30 [65998.741243] [<ffffffff810b66ae>] cpu_startup_entry+0xce/0x280 [65998.741245] [<ffffffff8103ed77>] start_secondary+0x217/0x2c0 [65998.741246] handlers: [65998.741248] [<ffffffff81469e90>] usb_hcd_irq [65998.741254] [<ffffffffa0436640>] ath_isr [ath9k] [65998.741255] Disabling IRQ #16 interesting look at the time of my initial report - exactly 24 hours Please try and recreate this without loading whatever out-of-tree modules you have loaded. sorry, i can't shutdown VMware Workstation on this machine, it's hosting all internal services, build-environments, my other machine has no WLAN card and that happened exactly 2 times until now maybe the changes from 3.11.2-201.fc19.x86_64 are fixing it for now but given that my colleague has the problem on F18 with ident hardware randomly over months and the amount of ath9k in the kernel-changelogs over months there is something wrong not related to the VMware modules Hi, also happens w/ Debian system with aptosid kernel 3.12-5.slh Happened since I installed chrome and use heavy Flash apps. Dec 22 13:43:22 osiris kernel: [ 321.852928] irq 16: nobody cared (try booting with the "irqpoll" option) Dec 22 13:43:22 osiris kernel: [ 321.852937] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P O 3.12-0.slh.2-aptosid-amd64 #1 Dec 22 13:43:22 osiris kernel: [ 321.852940] Hardware name: System manufacturer Maximus II Formula/Maximus II Formula, BIOS 2302 04/1 5/2010 Dec 22 13:43:22 osiris kernel: [ 321.852942] 0000000000000006 ffffffff813722d2 ffff88022439d600 ffffffff8107647a Dec 22 13:43:22 osiris kernel: [ 321.852946] ffff88022439d600 0000000000000000 0000000000000010 ffffffff81076804 Dec 22 13:43:22 osiris kernel: [ 321.852950] 0000000000000000 ffff88022439d600 0000000000000010 0000000000000000 Dec 22 13:43:22 osiris kernel: [ 321.852954] Call Trace: Dec 22 13:43:22 osiris kernel: [ 321.852957] <IRQ> [<ffffffff813722d2>] ? dump_stack+0x50/0x89 Dec 22 13:43:22 osiris kernel: [ 321.852968] [<ffffffff8107647a>] ? __report_bad_irq+0x2c/0xb4 Dec 22 13:43:22 osiris kernel: [ 321.852971] [<ffffffff81076804>] ? note_interrupt+0x145/0x1c5 Dec 22 13:43:22 osiris kernel: [ 321.852976] [<ffffffff81074c4c>] ? handle_irq_event_percpu+0x104/0x112 Dec 22 13:43:22 osiris kernel: [ 321.852980] [<ffffffff81074c8e>] ? handle_irq_event+0x34/0x51 Dec 22 13:43:22 osiris kernel: [ 321.852984] [<ffffffff81077019>] ? handle_fasteoi_irq+0x75/0xa6 Dec 22 13:43:22 osiris kernel: [ 321.852988] [<ffffffff8100bf90>] ? handle_irq+0x15/0x1d Dec 22 13:43:22 osiris kernel: [ 321.852992] [<ffffffff8100bc5e>] ? do_IRQ+0x40/0x95 Dec 22 13:43:22 osiris kernel: [ 321.852996] [<ffffffff81376bed>] ? common_interrupt+0x6d/0x6d Dec 22 13:43:22 osiris kernel: [ 321.852998] <EOI> [<ffffffff8128ea01>] ? arch_local_irq_enable+0x4/0x8 Dec 22 13:43:22 osiris kernel: [ 321.853007] [<ffffffff8128ecd1>] ? cpuidle_enter_state+0x50/0xa9 Dec 22 13:43:22 osiris kernel: [ 321.853019] [<ffffffff8128edf9>] ? cpuidle_idle_call+0xcf/0x119 Dec 22 13:43:22 osiris kernel: [ 321.853023] [<ffffffff81011c87>] ? arch_cpu_idle+0x5/0x17 Dec 22 13:43:22 osiris kernel: [ 321.853027] [<ffffffff8107448a>] ? cpu_startup_entry+0xed/0x146 Dec 22 13:43:22 osiris kernel: [ 321.853031] [<ffffffff8102b54d>] ? start_secondary+0x1ed/0x1f0 Dec 22 13:43:22 osiris kernel: [ 321.853033] handlers: Dec 22 13:43:22 osiris kernel: [ 321.853047] [<ffffffffa0009f87>] usb_hcd_irq [usbcore] Dec 22 13:43:22 osiris kernel: [ 321.853056] [<ffffffffa00ca3d7>] ata_bmdma_interrupt [libata] Dec 22 13:43:22 osiris kernel: [ 321.853142] [<ffffffffa052c21c>] nv_kern_isr [nvidia] Dec 22 13:43:22 osiris kernel: [ 321.853144] Disabling IRQ #16 .. upadted kernel, with IRQPOLL: Dec 25 13:03:34 osiris kernel: [ 329.462408] irq 16: nobody cared (try booting with the "irqpoll" option) Dec 25 13:03:34 osiris kernel: [ 329.462415] CPU: 2 PID: 0 Comm: swapper/2 Tainted: P O 3.12-5.slh.2-aptosid-amd64 #1 Dec 25 13:03:34 osiris kernel: [ 329.462418] Hardware name: System manufacturer Maximus II Formula/Maximus II Formula, BIOS 2302 04/1 5/2010 Dec 25 13:03:34 osiris kernel: [ 329.462420] 0000000000000006 ffffffff813728c6 ffff88022439d800 ffffffff81076551 Dec 25 13:03:34 osiris kernel: [ 329.462425] ffff88022439d800 0000000000000000 00000000000002c8 ffffffff810768db Dec 25 13:03:34 osiris kernel: [ 329.462429] 0000000000000000 ffff88022439d800 0000000000000010 0000000000000000 Dec 25 13:03:34 osiris kernel: [ 329.462433] Call Trace: Dec 25 13:03:34 osiris kernel: [ 329.462435] <IRQ> [<ffffffff813728c6>] ? dump_stack+0x50/0x89 Dec 25 13:03:34 osiris kernel: [ 329.462447] [<ffffffff81076551>] ? __report_bad_irq+0x2c/0xb4 Dec 25 13:03:34 osiris kernel: [ 329.462451] [<ffffffff810768db>] ? note_interrupt+0x145/0x1c5 Dec 25 13:03:34 osiris kernel: [ 329.462456] [<ffffffff81074d23>] ? handle_irq_event_percpu+0x104/0x112 Dec 25 13:03:34 osiris kernel: [ 329.462460] [<ffffffff81074d65>] ? handle_irq_event+0x34/0x51 Dec 25 13:03:34 osiris kernel: [ 329.462464] [<ffffffff810770f0>] ? handle_fasteoi_irq+0x75/0xa6 Dec 25 13:03:34 osiris kernel: [ 329.462468] [<ffffffff8100bf90>] ? handle_irq+0x15/0x1d Dec 25 13:03:34 osiris kernel: [ 329.462472] [<ffffffff8100bc5e>] ? do_IRQ+0x40/0x95 Dec 25 13:03:34 osiris kernel: [ 329.462476] [<ffffffff813771ad>] ? common_interrupt+0x6d/0x6d Dec 25 13:03:34 osiris kernel: [ 329.462478] <EOI> [<ffffffff8128ed26>] ? arch_local_irq_enable+0x4/0x8 Dec 25 13:03:34 osiris kernel: [ 329.462486] [<ffffffff8128eff6>] ? cpuidle_enter_state+0x50/0xa9 Dec 25 13:03:34 osiris kernel: [ 329.462500] [<ffffffff8128f11e>] ? cpuidle_idle_call+0xcf/0x119 Dec 25 13:03:34 osiris kernel: [ 329.462505] [<ffffffff81011c91>] ? arch_cpu_idle+0x5/0x17 Dec 25 13:03:34 osiris kernel: [ 329.462508] [<ffffffff8107455f>] ? cpu_startup_entry+0x109/0x164 Dec 25 13:03:34 osiris kernel: [ 329.462512] [<ffffffff8102b54d>] ? start_secondary+0x1ed/0x1f0 Dec 25 13:03:34 osiris kernel: [ 329.462515] handlers: Dec 25 13:03:34 osiris kernel: [ 329.462528] [<ffffffffa000a013>] usb_hcd_irq [usbcore] Dec 25 13:03:34 osiris kernel: [ 329.462537] [<ffffffffa014d3d4>] ata_bmdma_interrupt [libata] Dec 25 13:03:34 osiris kernel: [ 329.462623] [<ffffffffa062c21e>] nv_kern_isr [nvidia] Dec 25 13:03:34 osiris kernel: [ 329.462625] Disabling IRQ #16 IRQ 16: $ grep '16:' /proc/interrupts 16: 9379585 9251522 252 199 IO-APIC-fasteoi uhci_hcd:usb2, pata_marvell, nvidia well, that means we have different systems, different hardware and all the time IRQ16 is involved - looks like a deeper kernel problem, i am glad that it happened to me only twice, but as said my co-developer has 100% identical hardware as mine and it happens way too often to him he has the same machine at office without the WLAN card, it did not happen there once, so i suspect the more PCI/PCI-X cards the machine has the more likely it get triggered ps: More info: new is: I use eSATA via Marvell. I moved the nVidia to the other slot.. anyone an idea why it sticks on IRQ16? https://www.google.at/search?q=+irq+16%3A+nobody+cared * Fedora * CentOS * Debian * Arch Linux * SuSE ................ Have you tried booting with IRQPoll as the trace suggests? Any relieve with that? i only faced this issue exactly 2 times, so for me it is hard to nail down, i only know that others are again and again affected and opened that bugreport after the first time my machine did go down one of the other heavier affected users statet that IRQPoll does not help really interesting is the large count of google hits Hi, IRQPoll did not fix it, however I think I found the reason. MoBo Maximus II Formula IRQ16 in use by: nVidia,.. and Marvell. pata_marvell is also responsible for eSATA. After switching from eSATA to a (new USB3-adapter, no more lost IRQs. I do assume either the card or the Marvell driver have a bug and the kernel disables unhandled IRQs. (In reply to wheiss from comment #12) > Hi, > IRQPoll did not fix it, however I think I found the reason. > MoBo Maximus II Formula > IRQ16 in use by: nVidia,.. and Marvell. pata_marvell is also responsible for > eSATA. > After switching from eSATA to a (new USB3-adapter, no more lost IRQs. > I do assume either the card or the Marvell driver have a bug and the kernel > disables unhandled IRQs. Yes, that might do it. Unless they both fully support shared IRQs, chaos ensues. Even then, my experience has been to avoid it if at all possible, latency problems, etc. Harald, does this help you as well? my colleague don't want to replace his WLAN-AP with USB3 :-) we both compared our machines mutliple times they are 100% identical including the IRQ sharing no idea why it affected my only twice in 2 years and him much more often more funny that the two times it happened for me the second one was exactly 24 hours after the first https://bugzilla.redhat.com/show_bug.cgi?id=1013054#c3 i really don't get it :-( (In reply to Harald Reindl from comment #14) > my colleague don't want to replace his WLAN-AP with USB3 :-) > > we both compared our machines mutliple times > they are 100% identical including the IRQ sharing > no idea why it affected my only twice in 2 years and him much more often > > more funny that the two times it happened for me the second one was exactly > 24 hours after the first > https://bugzilla.redhat.com/show_bug.cgi?id=1013054#c3 > > i really don't get it :-( Perhaps the applications you run differ in what he runs, exposing the problem with different frequency..video demand, disk access on eSata. It may take a specific sequence to expose the timing window.. Do you both run similar applications load? his workload is more eclipse and mine more VMware machines with a lot of IO my machine exists much longer and because the hostapd WLAN-AP worked that fine we ordered exactly the same hardware again with nearly identical config (LAN/WAN/Bridges/Routing/VPN) i fear there is not much more to debug given that it takes sometimes 8 days and sometimes happens daily for him :-( Nor do I at this point..at least not from from me. Perhaps bad or just different hardware. Closing this, please feel free to reopen if you get some more information. |