Description of problem: Seems to be an issue with IRQ code and is triggered by some cron job, from the timing. My first guess would be updatedb, since it hits the hard drive, and one of the modules attached to IRQ 16 is ahci. I ran 2.6.35.4-25.fc14.x86_64 for a while before this without issue. So I am going to try going back to it for now. I may also try 2.6.36 if I can get it working. Yes, I know it is tainted by nvidia, but my computer is not useful without it. Oct 11 04:07:35 proton kernel: irq 16: nobody cared (try booting with the "irqpoll" option) Oct 11 04:07:35 proton kernel: Pid: 0, comm: swapper Tainted: P 2.6.35.5-29.fc14.x86_64 #1 Oct 11 04:07:35 proton kernel: Call Trace: Oct 11 04:07:35 proton kernel: <IRQ> [<ffffffff810a64f4>] __report_bad_irq+0x3d/0x8c Oct 11 04:07:35 proton kernel: [<ffffffff810a665b>] note_interrupt+0x118/0x17d Oct 11 04:07:35 proton kernel: [<ffffffff810a6e35>] handle_fasteoi_irq+0xa8/0xce Oct 11 04:07:35 proton kernel: [<ffffffff8100c28f>] handle_irq+0x88/0x91 Oct 11 04:07:35 proton kernel: [<ffffffff8146d3a4>] do_IRQ+0x5c/0xc3 Oct 11 04:07:35 proton kernel: [<ffffffff81467613>] ret_from_intr+0x0/0x11 Oct 11 04:07:35 proton kernel: <EOI> [<ffffffff8101172d>] ? mwait_idle+0x7a/0x87 Oct 11 04:07:35 proton kernel: [<ffffffff810116df>] ? mwait_idle+0x2c/0x87 Oct 11 04:07:35 proton kernel: [<ffffffff81008c1f>] cpu_idle+0xaa/0xe4 Oct 11 04:07:35 proton kernel: [<ffffffff8145fcf7>] start_secondary+0x253/0x294 Oct 11 04:07:35 proton kernel: handlers: Oct 11 04:07:35 proton kernel: [<ffffffff81317190>] (ahci_interrupt+0x0/0x5f4) Oct 11 04:07:35 proton kernel: [<ffffffff8133113c>] (usb_hcd_irq+0x0/0x7b) Oct 11 04:07:35 proton kernel: [<ffffffffa05ee451>] (nv_kern_isr+0x0/0x5e [nvidia]) Oct 11 04:07:35 proton kernel: Disabling IRQ #16 Oct 11 04:07:42 proton kernel: NVRM: Xid (0001:00): 16, Head 00000000 Count 0004511b Oct 11 04:07:42 proton kernel: NVRM: Xid (0001:00): 16, Head 00000001 Count 0002add3 Oct 11 04:07:50 proton kernel: NVRM: Xid (0001:00): 8, Channel 0000007f Oct 11 04:07:56 proton kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4695341271, last ping 4695346272, now 4695351280 Oct 11 04:07:56 proton kernel: connection1:0: detected conn error (1011) Oct 11 04:07:57 proton iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Oct 11 04:08:35 proton iscsid: connect to 192.168.254.2:3260 failed (No route to host) Oct 11 04:08:41 proton iscsid: connect to 192.168.254.2:3260 failed (No route to host) Oct 11 04:08:47 proton iscsid: connect to 192.168.254.2:3260 failed (No route to host) Oct 11 04:08:50 proton kernel: BUG: soft lockup - CPU#3 stuck for 61s! [ksoftirqd/3:13] Oct 11 04:08:50 proton kernel: Modules linked in: vmnet ppdev parport_pc parport vmblock vsock vmci vmmon coretemp hwmon_vid fuse capi capifs kernelcapi be2iscsi bnx2i cnic uio cxgb3i iw_cxgb3 cxgb3 mdio ib_iserOct 11 08:43:16 proton kernel: imklog 4.4.2, log source = /proc/kmsg started. Version-Release number of selected component (if applicable): 2.6.35.5-29.fc14.x86_64 Additional info: I had another like issue with 2.6.35.6-40.fc14.x86_64. This time no logs and only pingable. It happened overnight, and the logs ended right around 4am. Oct 16 04:36:09 proton named[1996]: lame server resolving '34.86.231.24.in-addr.arpa' (in '86.231.24.in-addr.arpa'?) : 216.104.96.10#53 Oct 16 14:06:15 proton kernel: imklog 4.4.2, log source = /proc/kmsg started.
I have a cluster of 120 identical machines, 64GB RAM and 64 cores. 9 have reported this error on boot (as opposed to during a cron job). The error occured 5 times on each machine, always twice for CPU0, if that help. The systems eventually boot and seem to be runnong ok. It would be very helpful to know if this is a serious problem OR if once the machine boots it is of less concern. -mark
Just thought I'd add a "me too". HP Proliant DL165 G7 64GB RAM and 24 cores. Relevant dmesg follows. Only happens under load and system keeps running afterwards. -John [65297.462043] BUG: soft lockup - CPU#2 stuck for 61s! [kswapd0:228] [65297.462043] Modules linked in: ext2 usb_storage ipv6 igb dca i2c_piix4 amd64_edac_mod edac_core i2c_core k10temp edac_mce_amd serio_raw microcode pata_acpi hpsa ata_generic pata_atiixp cciss megaraid_sas [last unloaded: scsi_wait_scan] [65297.462043] CPU 2 [65297.462043] Modules linked in: ext2 usb_storage ipv6 igb dca i2c_piix4 amd64_edac_mod edac_core i2c_core k10temp edac_mce_amd serio_raw microcode pata_acpi hpsa ata_generic pata_atiixp cciss megaraid_sas [last unloaded: scsi_wait_scan] [65297.462043] [65297.462043] Pid: 228, comm: kswapd0 Not tainted 2.6.35.6-45.fc14.x86_64 #1 /ProLiant DL165 G7 [65297.462043] RIP: 0010:[<ffffffff810e5cde>] [<ffffffff810e5cde>] zone_nr_free_pages+0x6a/0x98 [65297.462043] RSP: 0018:ffff8805291e7d00 EFLAGS: 00000287 [65297.462043] RAX: 000000000000000e RBX: ffff8805291e7d20 RCX: ffff880b51c80000 [65297.462043] RDX: 0000000000000000 RSI: 0000000000000100 RDI: 0000000000000100 [65297.462043] RBP: ffffffff8100a68e R08: 0000000000000000 R09: ffffffff81b81f60 [65297.462043] R10: 0000000000000000 R11: ffffffff81b81f60 R12: ffff8805291e7cf0 [65297.462043] R13: ffffffff8100a68e R14: ffff8805291e7cb0 R15: 0000000000000320 [65297.462043] FS: 00007f3bb39c67e0(0000) GS:ffff880002080000(0000) knlGS:00000000de484b70 [65297.462043] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [65297.462043] CR2: 00000000ee28d000 CR3: 0000000dcc190000 CR4: 00000000000006e0 [65297.462043] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [65297.462043] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [65297.462043] Process kswapd0 (pid: 228, threadinfo ffff8805291e6000, task ffff880529275d00) [65297.462043] Stack: [65297.462043] 0000000000000000 ffff880100000e00 0000000000000000 0000000000000000 [65297.462043] <0> ffff8805291e7d60 ffffffff810d7049 0000000000000000 ffff880500000000 [65297.462043] <0> ffff880100000000 000000000000000c 0000000000000e00 0000000000000002 [65297.462043] Call Trace: [65297.462043] [<ffffffff810d7049>] ? zone_watermark_ok+0x29/0xba [65297.462043] [<ffffffff810dff5c>] ? balance_pgdat+0x16a/0x4c8 [65297.462043] [<ffffffff8100a68e>] ? apic_timer_interrupt+0xe/0x20 [65297.462043] [<ffffffff810e045e>] ? kswapd+0x1a4/0x1ba [65297.462043] [<ffffffff810663c3>] ? autoremove_wake_function+0x0/0x39 [65297.462043] [<ffffffff810e02ba>] ? kswapd+0x0/0x1ba [65297.462043] [<ffffffff81065f29>] ? kthread+0x7f/0x87 [65297.462043] [<ffffffff8100aae4>] ? kernel_thread_helper+0x4/0x10 [65297.462043] [<ffffffff81065eaa>] ? kthread+0x0/0x87 [65297.462043] [<ffffffff8100aae0>] ? kernel_thread_helper+0x0/0x10 [65297.462043] Code: 00 75 4e 4c 8b a7 30 05 00 00 83 c8 ff 4c 8b 2d f1 5b 52 00 eb 18 48 63 c8 48 8b 53 58 48 8b 0c cd 50 04 b8 81 48 0f be 54 0a 42 <49> 01 d4 ff c0 be 00 01 00 00 4c 89 ef 48 63 d0 e8 a9 27 13 00 [65297.462043] Call Trace: [65297.462043] [<ffffffff810d7049>] ? zone_watermark_ok+0x29/0xba [65297.462043] [<ffffffff810dff5c>] ? balance_pgdat+0x16a/0x4c8 [65297.462043] [<ffffffff8100a68e>] ? apic_timer_interrupt+0xe/0x20 [65297.462043] [<ffffffff810e045e>] ? kswapd+0x1a4/0x1ba [65297.462043] [<ffffffff810663c3>] ? autoremove_wake_function+0x0/0x39 [65297.462043] [<ffffffff810e02ba>] ? kswapd+0x0/0x1ba [65297.462043] [<ffffffff81065f29>] ? kthread+0x7f/0x87 [65297.462043] [<ffffffff8100aae4>] ? kernel_thread_helper+0x4/0x10 [65297.462043] [<ffffffff81065eaa>] ? kthread+0x0/0x87 [65297.462043] [<ffffffff8100aae0>] ? kernel_thread_helper+0x0/0x10
I am also seeing this in F15. System does not recover, VT switch fails, must reboot. Apr 2 08:45:30 ace kernel: [ 5950.898533] BUG: soft lockup - CPU#3 stuck for 67s! [kswapd0:49] Apr 2 08:45:30 ace kernel: [ 5950.898535] Modules linked in: fuse coretemp sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf rfcomm sco bnep l2cap ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel arc4 snd_hda_codec snd_hwdep snd_seq iwlagn iwlcore snd_seq_device mac80211 snd_pcm btusb snd_timer uvcvideo microcode snd e1000e cfg80211 bluetooth iTCO_wdt i2c_i801 joydev soundcore videodev iTCO_vendor_support snd_page_alloc v4l2_compat_ioctl32 rfkill wmi uinput ipv6 firewire_ohci sdhci_pci sdhci firewire_core mmc_core crc_itu_t i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] Apr 2 08:45:30 ace kernel: [ 5950.898565] CPU 3 Apr 2 08:45:30 ace kernel: [ 5950.898573] Modules linked in: fuse coretemp sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf rfcomm sco bnep l2cap ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel arc4 snd_hda_codec snd_hwdep snd_seq iwlagn iwlcore snd_seq_device mac80211 snd_pcm btusb snd_timer uvcvideo microcode snd e1000e cfg80211 bluetooth iTCO_wdt i2c_i801 joydev soundcore videodev iTCO_vendor_support snd_page_alloc v4l2_compat_ioctl32 rfkill wmi uinput ipv6 firewire_ohci sdhci_pci sdhci firewire_core mmc_core crc_itu_t i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] Apr 2 08:45:30 ace kernel: [ 5950.898594] Apr 2 08:45:30 ace kernel: [ 5950.898596] Pid: 49, comm: kswapd0 Not tainted 2.6.38.2-10.fc15.x86_64 #1 LENOVO 4239CTO/4239CTO Apr 2 08:45:30 ace kernel: [ 5950.898599] RIP: 0010:[<ffffffffa007d097>] [<ffffffffa007d097>] i915_gem_inactive_shrink+0x6c/0x194 [i915] Apr 2 08:45:30 ace kernel: [ 5950.898611] RSP: 0018:ffff88006ca6fd50 EFLAGS: 00000206 Apr 2 08:45:30 ace kernel: [ 5950.898612] RAX: ffff880041d1c200 RBX: 00000000000000c0 RCX: 0000000000000000 Apr 2 08:45:30 ace kernel: [ 5950.898613] RDX: ffff8800235a44b0 RSI: 0000000000000000 RDI: ffff880037a91820 Apr 2 08:45:30 ace kernel: [ 5950.898615] RBP: ffff88006ca6fd90 R08: 0000000000000004 R09: 0000000000000009 Apr 2 08:45:30 ace kernel: [ 5950.898616] R10: 0000000000000002 R11: ffffffff81a44e40 R12: ffffffff8100a58e Apr 2 08:45:30 ace kernel: [ 5950.898617] R13: ffff88006ca6fcf0 R14: ffff88006ca6fcf8 R15: ffffffff810dfda7 Apr 2 08:45:30 ace kernel: [ 5950.898619] FS: 0000000000000000(0000) GS:ffff8800786c0000(0000) knlGS:0000000000000000 Apr 2 08:45:30 ace kernel: [ 5950.898621] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Apr 2 08:45:30 ace kernel: [ 5950.898622] CR2: 00000036f26ac524 CR3: 000000005d3a3000 CR4: 00000000000406e0 Apr 2 08:45:30 ace kernel: [ 5950.898623] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 2 08:45:30 ace kernel: [ 5950.898625] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 2 08:45:30 ace kernel: [ 5950.898626] Process kswapd0 (pid: 49, threadinfo ffff88006ca6e000, task ffff88006ca64560) Apr 2 08:45:30 ace kernel: [ 5950.898628] Stack: Apr 2 08:45:30 ace kernel: [ 5950.898629] ffff88006ca6fd90 ffff88003790b5c8 ffff88006ca6fd60 ffff88003790b580 Apr 2 08:45:30 ace kernel: [ 5950.898631] 0000000000000000 0000000000000000 00000000000000d0 0000000000035cdd Apr 2 08:45:30 ace kernel: [ 5950.898634] ffff88006ca6fde0 ffffffff810e44ed 0000000000000001 0000000000000080 Apr 2 08:45:30 ace kernel: [ 5950.898636] Call Trace: Apr 2 08:45:30 ace kernel: [ 5950.898640] [<ffffffff810e44ed>] shrink_slab+0x6d/0x166 Apr 2 08:45:30 ace kernel: [ 5950.898643] [<ffffffff810e7116>] kswapd+0x517/0x77c Apr 2 08:45:30 ace kernel: [ 5950.898645] [<ffffffff810e6bff>] ? kswapd+0x0/0x77c Apr 2 08:45:30 ace kernel: [ 5950.898647] [<ffffffff8106ea73>] kthread+0x84/0x8c Apr 2 08:45:30 ace kernel: [ 5950.898650] [<ffffffff8100a9e4>] kernel_thread_helper+0x4/0x10 Apr 2 08:45:30 ace kernel: [ 5950.898651] [<ffffffff8106e9ef>] ? kthread+0x0/0x8c Apr 2 08:45:30 ace kernel: [ 5950.898653] [<ffffffff8100a9e0>] ? kernel_thread_helper+0x0/0x10 Apr 2 08:45:30 ace kernel: [ 5950.898654] Code: e4 48 89 45 c8 75 37 48 8b 43 48 45 31 ed 48 83 c3 48 48 2d b0 00 00 00 eb 0a 48 8d 82 50 ff ff ff 41 ff c5 48 8b 90 b0 00 00 00 Apr 2 08:45:30 ace kernel: [ 5950.898673] Call Trace: Apr 2 08:45:30 ace kernel: [ 5950.898675] [<ffffffff810e44ed>] shrink_slab+0x6d/0x166 Apr 2 08:45:30 ace kernel: [ 5950.898676] [<ffffffff810e7116>] kswapd+0x517/0x77c Apr 2 08:45:30 ace kernel: [ 5950.898678] [<ffffffff810e6bff>] ? kswapd+0x0/0x77c Apr 2 08:45:30 ace kernel: [ 5950.898680] [<ffffffff8106ea73>] kthread+0x84/0x8c Apr 2 08:45:30 ace kernel: [ 5950.898682] [<ffffffff8100a9e4>] kernel_thread_helper+0x4/0x10 Apr 2 08:45:30 ace kernel: [ 5950.898683] [<ffffffff8106e9ef>] ? kthread+0x0/0x8c Apr 2 08:45:30 ace kernel: [ 5950.898685] [<ffffffff8100a9e0>] ? kernel_thread_helper+0x0/0x10 Apr 2 08:45:31 ace abrt-dump-oops: Found oopses: 1
This message is a notice that Fedora 14 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 14. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At this time, all open bugs with a Fedora 'version' of '14' have been closed as WONTFIX. (Please note: Our normal process is to give advanced warning of this occurring, but we forgot to do that. A thousand apologies.) Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, feel free to reopen this bug and simply change the 'version' to a later Fedora version. Bug Reporter: Thank you for reporting this issue and we are sorry that we were unable to fix it before Fedora 14 reached end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" (top right of this page) and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping