Description of problem: System hang (hard) - Requires power cycle. System fails with this error message : "do_IRQ 0.177 No irq handler for vector" This happens only on SMP boxes. I've tested on IBM x3550 with four cores. Also have seen on a Dell Optiplex intel 2-core system. Both were running X86_64 installs. This has been reported on fedora-devel & fedora-list. This is an upstream kernel problem, there is a conversation regarding this issue here : http://lkml.org/lkml/2007/1/22/121 Version-Release number of selected component (if applicable): 2.6.19 kernel and above. Seen under both kernel-2.6.20-1.2922.fc7 & kernel-2.6.19-1.2895.fc6. How reproducible: Consistantly. Additional info: There seems to be a work around of disabling "irqbalance". I've done this and am running 2.6.20-1.2922.fc7 seemingly ok for now, but will keep an eye on it and run some performance tests.
bug 225399 is related to this one
Same issue with - 2.6.20-1.2925.fc7
The 100% effective workaround for this is to disable the IRQBalance service. Enabling it kills the system (for me) in only a few minutes. There is an upstream patch (as discussed in the linked thread) and this ticket will be open until that makes its way into the FC kernel. Sadly 2.6.20-1.2930.fc7 has just failed to boot for me so I am unable to test it.
Moving to 'devel' as discussed on https://www.redhat.com/archives/fedora-devel-list/2007-March/msg00095.html.
i had this very issue on an HP DL385g1 box with with two dualcore AMD processors on Fedora Core 6 with kernel version: 2.6.19-1.2895.fc6 (x86_64) this was the last message on the login prompt: do_IRQ: 1.57 No irq handler for vector 00:03.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07) 00:04.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05) 00:04.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03) 00:04.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05) 00:07.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 00:07.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 00:08.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 00:08.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTra nsport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Con troller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscella neous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTra nsport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Con troller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscella neous Control 01:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 01:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b) 01:02.0 System peripheral: Compaq Computer Corporation Integrated Lights Out Con troller (rev 01) 01:02.2 System peripheral: Compaq Computer Corporation Integrated Lights Out Pr ocessor (rev 01) 01:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 02:04.0 RAID bus controller: Compaq Computer Corporation Smart Array 64xx (rev 0 1) 03:06.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethe rnet (rev 10) 03:06.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit Ethe rnet (rev 10) 04:09.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 04:09.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 04:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 04:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 06:09.0 RAID bus controller: Hewlett-Packard Company Smart Array P600 t
I think this is related to this bug, I have an ASUS P5ND2, with a Quad 2.4GHz Intel processor, with 4 GB RAM, I've been having apparent problems related to do_IRQ ever since I got this machine, I've already turned off IRQ Balance, tried setting smp_affinity, all to no avail. Running Fedora 7, not development version though. I am running with 2 500GB disk in a RAID stripe, using the hardware on the motherboard. The hardware runs smooth, for hours on end with XP service pack 2, which makes me think it may not be a fault in the hardware. However the system seems to rarely survive heavy network traffic combined with I/O (acts as a file server), while being used as a workstation. The symptoms are generally the system starts to slow down, then X stops responding to events, though the mouse still works, network services are still working, though will not spawn any processes - i.e. ssh accepts the connection, but never allows you to log in, the system acts as if the system had an extreme load level (may be the disks I/O is blocked), SMB and NFS transfers seem to keep working for quite some time after the event start. Rebooting, either by pressing the power switch, or issuing the command, is rarely effective, even if the command can be executed. It is hard to get a dump of the event, because the first devices that seem impacted is the hard drives, and therefore the logs are generally not recorded, I've managed to capture some events. In this particular log, I managed to start a reboot before the system got in to a hard lock. It's hard to determine if it's the event causing the problem or just some random noise. Therefore these logs may or may not be directly related to the hard crash. The log segment may just be follow up errors, I am unable to tell. Feel free to contact me, if there is any specific thing you wish me to try. Boot kernel info : Sep 3 17:36:34 taipan kernel: Linux version 2.6.21-1.3194.fc7 (kojibuilder.redhat.com) (gcc version 4.1.2 20070502 (Red Hat 4.1.2-12)) #1 SMP Wed May 23 22:47:07 EDT 2007 Event: Sep 3 17:36:44 taipan smartd[3455]: smartd has fork()ed into background mode. New PID=3455. Sep 3 17:36:45 taipan pcscd: winscard.c:219:SCardConnect() Reader E-Gate 0 0 Not Found Sep 3 17:36:45 taipan last message repeated 3 times Sep 3 17:36:57 taipan kernel: BUG: warning at kernel/softirq.c:138/local_bh_enable() (Tainted: P ) Sep 3 17:36:58 taipan kernel: Sep 3 17:36:58 taipan kernel: Call Trace: Sep 3 17:36:58 taipan kernel: [<ffffffff80229e7b>] local_bh_enable+0x42/0x98 Sep 3 17:36:58 taipan kernel: [<ffffffff8025c008>] cond_resched_softirq+0x35/0x4b Sep 3 17:36:58 taipan kernel: [<ffffffff8022e9f5>] release_sock+0x59/0xaa Sep 3 17:36:58 taipan kernel: [<ffffffff8021baab>] tcp_recvmsg+0x3d1/0xadf Sep 3 17:36:58 taipan kernel: [<ffffffff8022f84e>] sock_common_recvmsg+0x30/0x45 Sep 3 17:36:58 taipan kernel: [<ffffffff803e75b9>] sock_aio_read+0x10c/0x124 Sep 3 17:36:58 taipan kernel: [<ffffffff8020c716>] do_sync_read+0xc9/0x10c Sep 3 17:36:58 taipan kernel: [<ffffffff80293107>] autoremove_wake_function+0x0/0x2e Sep 3 17:36:58 taipan kernel: [<ffffffff8020af1d>] vfs_read+0xde/0x173 Sep 3 17:36:58 taipan kernel: [<ffffffff80210606>] sys_read+0x45/0x6e Sep 3 17:36:58 taipan kernel: [<ffffffff8025729c>] tracesys+0xdc/0xe1 Sep 3 17:36:58 taipan kernel: Sep 3 17:38:18 taipan gconfd (mike-3810): starting (version 2.18.0.1), pid 3810 user 'mike' Sep 3 17:38:18 taipan gconfd (mike-3810): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0 Sep 3 17:38:18 taipan gconfd (mike-3810): Resolved address "xml:readwrite:/home/mike/.gconf" to a writable configuration source at position 1 Sep 3 17:38:18 taipan gconfd (mike-3810): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2 Sep 3 17:40:52 taipan ntpd[3038]: synchronized to 62.75.136.76, stratum 2 Sep 3 17:40:52 taipan ntpd[3038]: kernel time sync status change 0001 Sep 3 18:02:20 taipan ntpd[3038]: synchronized to 32.112.56.88, stratum 2 Sep 3 18:19:24 taipan ntpd[3038]: synchronized to 62.75.136.76, stratum 2 Sep 3 19:00:26 taipan kernel: do_IRQ: 0.167 No irq handler for vector Sep 3 19:00:26 taipan last message repeated 9 times Sep 3 19:00:31 taipan kernel: printk: 2293121 messages suppressed. Sep 3 19:00:31 taipan kernel: do_IRQ: 0.167 No irq handler for vector Sep 3 19:00:36 taipan kernel: printk: 2359668 messages suppressed. Sep 3 19:00:36 taipan kernel: do_IRQ: 0.167 No irq handler for vector Sep 3 19:00:41 taipan kernel: printk: 2241271 messages suppressed. Sep 3 19:00:41 taipan kernel: do_IRQ: 0.167 No irq handler for vector Sep 3 19:00:46 taipan kernel: printk: 2386358 messages suppressed. Sep 3 19:00:46 taipan kernel: do_IRQ: 0.167 No irq handler for vector Sep 3 19:00:51 taipan kernel: printk: 2536816 messages suppressed. Sep 3 19:00:51 taipan kernel: do_IRQ: 0.167 No irq handler for vector Sep 3 19:00:56 taipan kernel: printk: 2211906 messages suppressed. Sep 3 19:00:56 taipan kernel: do_IRQ: 0.167 No irq handler for vector Sep 3 19:01:01 taipan kernel: printk: 2203052 messages suppressed. Sep 3 19:01:01 taipan kernel: do_IRQ: 0.167 No irq handler for vector Sep 3 19:01:06 taipan kernel: printk: 2209702 messages suppressed. Sep 3 19:01:06 taipan kernel: do_IRQ: 0.167 No irq handler for vector Sep 3 19:01:10 taipan shutdown[5416]: shutting down for system reboot Sep 3 19:01:11 taipan kernel: printk: 2192509 messages suppressed. Sep 3 19:01:14 taipan kernel: do_IRQ: 0.167 No irq handler for vector Sep 3 19:01:15 taipan gconfd (mike-3810): Received signal 15, shutting down cleanly Sep 3 19:01:15 taipan gconfd (mike-3810): Exiting Sep 3 19:01:16 taipan kernel: printk: 2233562 messages suppressed. Sep 3 19:01:16 taipan kernel: do_IRQ: 0.167 No irq handler for vector Sep 3 19:01:21 taipan kernel: printk: 2090653 messages suppressed. Sep 3 19:01:21 taipan kernel: do_IRQ: 0.167 No irq handler for vector Sep 3 19:01:21 taipan smartd[3455]: smartd received signal 15: Terminated Sep 3 19:01:21 taipan smartd[3455]: smartd is exiting (exit status 0) Sep 3 19:01:21 taipan avahi-daemon[3341]: Got SIGTERM, quitting.
2.6.22 kernels for Fedora 7 are available. now. Do they fix these problems?
Though I cannot be entirely certain that the cause is the same, I have had q hard freeze with 2.6.22 - I upgraded to 2.6.22 immediately after the event which I recorded in my last message (I have a habit of running yum update on my machine after a crash to see if there are patches available, which may solve the issue). Sep 3 19:41:19 taipan kernel: Linux version 2.6.22.4-65.fc7 (kojibuilder.redhat.com) (gcc version 4.1.2 20070502 (Red Hat 4.1.2-12)) #1 SMP Tue Aug 21 21:50:50 EDT 2007 [mike@taipan mike] > uname -a Linux taipan 2.6.22.4-65.fc7 #1 SMP Tue Aug 21 21:50:50 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux Unfortunately I had a hard hang, on the 11th of September (ironically), running with that kernel. Unfortunately I was not able to capture a log of the event, due to the aforementioned issues. If someone can tell me a way of getting a log of such an event, I'm willing to try. However the frequency of the event seems to have decreased, with the new kernel. I'm able to run for much longer without a hang (up to several days), while with the older .21 kernel version, the event would tend to happen multiple times during a day. Of course if the event is dependent on the system load, it is entirely possible, that this is because the system has not been heavily loaded during this time.
Based on the date this bug was created, it appears to have been reported against rawhide during the development of a Fedora release that is no longer maintained. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained. If this bug remains in NEEDINFO thirty (30) days from now, we will automatically close it. If you can reproduce this bug in a maintained Fedora version (7, 8, or rawhide), please change this bug to the respective version and change the status to ASSIGNED. (If you're unable to change the bug's version or status, add a comment to the bug and someone will change it for you.) Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again.
This bug has been in NEEDINFO for more than 30 days since feedback was first requested. As a result we are closing it. If you can reproduce this bug in the future against a maintained Fedora version please feel free to reopen it against that version. The process we're following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp