Bug 501137 - BUG: soft lockup - CPU#0 stuck for 61s! [work_on_cpu/0:36]
Summary: BUG: soft lockup - CPU#0 stuck for 61s! [work_on_cpu/0:36]
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 11
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 578894
TreeView+ depends on / blocked
 
Reported: 2009-05-16 22:45 UTC by Jason Tibbitts
Modified: 2014-03-04 11:12 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-06-28 12:34:34 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Jason Tibbitts 2009-05-16 22:45:53 UTC
An 8-core Xeon (two sockets) on which I just installed this morning's rawhide (2.6.29.3-140.fc11.x86_64) got stuck for several minutes at bootup.  The following message would be displayed at intervals:

May 16 17:08:07 util2 kernel: BUG: soft lockup - CPU#0 stuck for 61s! [work_on_cpu/0:36]
May 16 17:08:07 util2 kernel: Modules linked in: i5k_amb iTCO_wdt hwmon usb_storage iTCO_vendor_support e1000(+) i5000_edac(+) edac_core e1000e(+) serio_raw joydev shpchp pcspkr i2c_i801 raid10 radeon drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
May 16 17:08:07 util2 kernel: CPU 0:
May 16 17:08:07 util2 kernel: Modules linked in: i5k_amb iTCO_wdt hwmon usb_storage iTCO_vendor_support e1000(+) i5000_edac(+) edac_core e1000e(+) serio_raw joydev shpchp pcspkr i2c_i801 raid10 radeon drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
May 16 17:08:07 util2 kernel: Pid: 36, comm: work_on_cpu/0 Not tainted 2.6.29.3-140.fc11.x86_64 #1 X7DBP
May 16 17:08:07 util2 kernel: RIP: 0010:[<ffffffffa01253ff>]  [<ffffffffa01253ff>] e1000_write_phy_reg_ex+0x54/0xa9 [e1000]
May 16 17:08:07 util2 kernel: RSP: 0018:ffff88042c9bbbf0  EFLAGS: 00000216
May 16 17:08:07 util2 kernel: RAX: 00000000ffffffff RBX: ffff88042c9bbc10 RCX: 0000000010d6578a
May 16 17:08:07 util2 kernel: RDX: 0000000000003762 RSI: 0000000000000000 RDI: 0000000000003756
May 16 17:08:07 util2 kernel: RBP: ffffffff8101211e R08: ffff88042c9ba000 R09: ffffffff811b908e
May 16 17:08:07 util2 kernel: R10: 0000000000000286 R11: ffff880427955a90 R12: ffff88042c9bbb80
May 16 17:08:07 util2 kernel: R13: ffffffff81052621 R14: ffff88042c9bbb80 R15: ffff88042c9bbbe0
May 16 17:08:07 util2 kernel: FS:  0000000000000000(0000) GS:ffffffff817b7000(0000) knlGS:0000000000000000
May 16 17:08:07 util2 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
May 16 17:08:07 util2 kernel: CR2: 00007fffbf64f0cc CR3: 0000000000201000 CR4: 00000000000006e0
May 16 17:08:07 util2 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 16 17:08:07 util2 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 16 17:08:07 util2 kernel: Call Trace:
May 16 17:08:07 util2 kernel: [<ffffffffa01254d5>] ? e1000_write_phy_reg+0x81/0x11b [e1000]
May 16 17:08:07 util2 kernel: [<ffffffffa0126895>] ? e1000_phy_init_script+0x147/0x209 [e1000]
May 16 17:08:07 util2 kernel: [<ffffffffa01276c8>] ? e1000_phy_reset+0x90/0x99 [e1000]
May 16 17:08:07 util2 kernel: [<ffffffffa0128e84>] ? e1000_setup_copper_link+0x3c0/0x1048 [e1000]
May 16 17:08:07 util2 kernel: [<ffffffff81011f67>] ? restore_args+0x0/0x30
May 16 17:08:07 util2 kernel: [<ffffffffa0129c00>] ? e1000_setup_link+0xf4/0x461 [e1000]
May 16 17:08:07 util2 kernel: [<ffffffffa012a4d8>] ? e1000_init_hw+0x56b/0xa33 [e1000]
May 16 17:08:07 util2 kernel: [<ffffffffa0122c9e>] ? e1000_reset+0x1a0/0x2b9 [e1000]
May 16 17:08:07 util2 kernel: [<ffffffffa012e613>] ? e1000_probe+0xaf1/0xc57 [e1000]
May 16 17:08:07 util2 kernel: [<ffffffff8104070f>] ? default_wake_function+0x12/0x14
May 16 17:08:07 util2 kernel: [<ffffffff81058c80>] ? do_work_for_cpu+0x0/0x20
May 16 17:08:07 util2 kernel: [<ffffffff811c5749>] ? local_pci_probe+0x17/0x1b
May 16 17:08:07 util2 kernel: [<ffffffff81058c98>] ? do_work_for_cpu+0x18/0x20
May 16 17:08:07 util2 kernel: [<ffffffff81058e0a>] ? run_workqueue+0xa7/0x14a
May 16 17:08:07 util2 kernel: [<ffffffff81058f99>] ? worker_thread+0xec/0xfd
May 16 17:08:07 util2 kernel: [<ffffffff8105ca4b>] ? autoremove_wake_function+0x0/0x39
May 16 17:08:07 util2 kernel: [<ffffffff81058ead>] ? worker_thread+0x0/0xfd
May 16 17:08:07 util2 kernel: [<ffffffff81058ead>] ? worker_thread+0x0/0xfd
May 16 17:08:07 util2 kernel: [<ffffffff8105c6b5>] ? kthread+0x4d/0x78
May 16 17:08:07 util2 kernel: [<ffffffff8101264a>] ? child_rip+0xa/0x20
May 16 17:08:07 util2 kernel: [<ffffffff81011f67>] ? restore_args+0x0/0x30
May 16 17:08:07 util2 kernel: [<ffffffff8105c668>] ? kthread+0x0/0x78
May 16 17:08:07 util2 kernel: [<ffffffff81012640>] ? child_rip+0x0/0x20

This repeated at intervals exactly four times with absolutely no change in the output, and then the system continued with booting.  I was not able to repeat it after at least ten additional reboots.

Comment 1 Bug Zapper 2009-06-09 15:57:43 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 2 Ben Webb 2009-08-31 19:17:06 UTC
I'm seeing something similar with the latest F11 kernel (2.6.29.6-217.2.16.fc11.x86_64). May be related?

This is on a machine that is currently running an rsync backup of a large volume, which at least explains why it's hitting the swap. I got the same message reported once a minute for 10 minutes or so, after which I couldn't ssh to the machine any more and had to power cycle it.

Aug 31 10:52:27 organ kernel: BUG: soft lockup - CPU#3 stuck for 61s! [kswapd1:48]
Aug 31 10:52:27 organ kernel: Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ipv6 ts_kmp nf_conntrack_amanda nf_conntrack_ftp ipt_LOG xt_limit xt_multiport cpufreq_ondemand powernow_k8 freq_table ata_generic pata_acpi tg3 hpwdt k8temp i2c_amd756 floppy i2c_core hwmon pcspkr pata_amd amd_rng shpchp cciss [last unloaded: scsi_wait_scan]
Aug 31 10:52:27 organ kernel: CPU 3:
Aug 31 10:52:27 organ kernel: Modules linked in: nfs lockd nfs_acl auth_rpcgss sunrpc ipv6 ts_kmp nf_conntrack_amanda nf_conntrack_ftp ipt_LOG xt_limit xt_multiport cpufreq_ondemand powernow_k8 freq_table ata_generic pata_acpi tg3 hpwdt k8temp i2c_amd756 floppy i2c_core hwmon pcspkr pata_amd amd_rng shpchp cciss [last unloaded: scsi_wait_scan]
Aug 31 10:52:27 organ kernel: Pid: 48, comm: kswapd1 Not tainted 2.6.29.6-217.2.16.fc11.x86_64 #1 ProLiant DL385 G1
Aug 31 10:52:27 organ kernel: RIP: 0010:[<ffffffff8109e135>]  [<ffffffff8109e135>] page_cache_get_speculative+0x32/0x34
Aug 31 10:52:27 organ kernel: RSP: 0018:ffff88007dc43bc0  EFLAGS: 00000246
Aug 31 10:52:27 organ kernel: RAX: 0000000000000000 RBX: ffff88007dc43bc0 RCX: 0000000000000040
Aug 31 10:52:27 organ kernel: RDX: 0000000000000000 RSI: ffff88007dc43c90 RDI: ffffe200017c9898
Aug 31 10:52:27 organ kernel: RBP: ffffffff8101211e R08: ffff88007dc43b90 R09: 000000000000000e
Aug 31 10:52:27 organ kernel: R10: ffff88002eeabd28 R11: ffff88002eeabb28 R12: ffff88007dc42000
Aug 31 10:52:27 organ kernel: R13: ffffffff81781890 R14: ffff880081e1f200 R15: 0000000000000012
Aug 31 10:52:27 organ kernel: FS:  00007f56306096f0(0000) GS:ffff8800f4856480(0000) knlGS:00000000f7eed6c0
Aug 31 10:52:27 organ kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Aug 31 10:52:27 organ kernel: CR2: 00007f2722fa8000 CR3: 000000002f9a7000 CR4: 00000000000006e0
Aug 31 10:52:27 organ kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 31 10:52:27 organ kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 31 10:52:27 organ kernel: Call Trace:
Aug 31 10:52:27 organ kernel: [<ffffffff8109ea45>] ? find_get_pages+0x6d/0xb6
Aug 31 10:52:27 organ kernel: [<ffffffff810a71c4>] ? pagevec_lookup+0x22/0x2b
Aug 31 10:52:27 organ kernel: [<ffffffff810a81d3>] ? __invalidate_mapping_pages+0x155/0x188
Aug 31 10:52:27 organ kernel: [<ffffffff810a5e91>] ? determine_dirtyable_memory+0x1a/0x2d
Aug 31 10:52:27 organ kernel: [<ffffffff810a5f1d>] ? get_dirty_limits+0x27/0x25f
Aug 31 10:52:27 organ kernel: [<ffffffffa017447c>] ? nfs_destroy_inode+0x1c/0x1e [nfs]
Aug 31 10:52:27 organ kernel: [<ffffffff810a8216>] ? invalidate_mapping_pages+0x10/0x12
Aug 31 10:52:27 organ kernel: [<ffffffff810e6ac0>] ? shrink_icache_memory+0xee/0x207
Aug 31 10:52:27 organ kernel: [<ffffffff810aa90e>] ? shrink_slab+0xea/0x160
Aug 31 10:52:27 organ kernel: [<ffffffff810ab093>] ? kswapd+0x49f/0x632
Aug 31 10:52:27 organ kernel: [<ffffffff810a9093>] ? isolate_pages_global+0x0/0x203
Aug 31 10:52:27 organ kernel: [<ffffffff8105c91b>] ? autoremove_wake_function+0x0/0x39
Aug 31 10:52:27 organ kernel: [<ffffffff810aabf4>] ? kswapd+0x0/0x632
Aug 31 10:52:27 organ kernel: [<ffffffff8105c585>] ? kthread+0x4d/0x78
Aug 31 10:52:27 organ kernel: [<ffffffff8101264a>] ? child_rip+0xa/0x20
Aug 31 10:52:27 organ kernel: [<ffffffff81011f67>] ? restore_args+0x0/0x30
Aug 31 10:52:27 organ kernel: [<ffffffff8105c538>] ? kthread+0x0/0x78
Aug 31 10:52:27 organ kernel: [<ffffffff81012640>] ? child_rip+0x0/0x20

Comment 3 Need Real Name 2010-04-01 16:55:58 UTC
I am getting this frequently on FC 12, kernel

Linux vader 2.6.32.9-67.fc12.i686.PAE #1 SMP Sat Feb 27 09:42:55 UTC 2010 i686 i686 i386 GNU/Linux

resulting in complete lockups of the GUI and system

Apr  1 09:29:37 localhost kernel: BUG: soft lockup - CPU#0 stuck for 61s! [kacpi_notify:23]
Apr  1 09:29:37 localhost kernel: Modules linked in: aes_i586 aes_generic fuse rfcomm sco bridge stp llc bnep l2cap cpufreq_ondemand acpi_cpufreq ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 uinput btusb bluetooth snd_hda_codec_conexant snd_hda_intel snd_hda_codec arc4 ecb snd_hwdep snd_seq_dummy mmc_block snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss iwlagn snd_mixer_oss snd_pcm uvcvideo iwlcore sdhci_pci videodev snd_timer mac80211 sdhci v4l1_compat i2c_i801 joydev thinkpad_acpi wmi mmc_core ricoh_mmc snd iTCO_wdt cfg80211 iTCO_vendor_support soundcore snd_page_alloc e1000e rfkill dm_multipath firewire_ohci firewire_core crc_itu_t yenta_socket rsrc_nonstatic ata_generic pata_acpi i915 radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core video output [last unloaded: microcode]
Apr  1 09:29:37 localhost kernel:
Apr  1 09:29:37 localhost kernel: Pid: 23, comm: kacpi_notify Not tainted (2.6.32.9-67.fc12.i686.PAE #1) 2081CTO
Apr  1 09:29:37 localhost kernel: EIP: 0060:[<f865207e>] EFLAGS: 00000286 CPU: 0
Apr  1 09:29:37 localhost kernel: EIP is at intel_opregion_video_event+0x1e/0x25 [i915]
Apr  1 09:29:37 localhost kernel: EAX: fbbb8313 EBX: f865faa0 ECX: f7115ee8 EDX: f6a231a0
Apr  1 09:29:37 localhost kernel: ESI: 00000000 EDI: 00000000 EBP: f7115e8c ESP: f7115e8c
Apr  1 09:29:37 localhost kernel: DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Apr  1 09:29:37 localhost kernel: CR0: 8005003b CR2: b7651000 CR3: 00969000 CR4: 000406f0
Apr  1 09:29:37 localhost kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Apr  1 09:29:37 localhost kernel: DR6: ffff0ff0 DR7: 00000400
Apr  1 09:29:37 localhost kernel: Call Trace:
Apr  1 09:29:37 localhost kernel: [<c07a7532>] notifier_call_chain+0x2b/0x4d
Apr  1 09:29:37 localhost kernel: [<c045f182>] __blocking_notifier_call_chain+0x3c/0x51
Apr  1 09:29:37 localhost kernel: [<c045f1a8>] blocking_notifier_call_chain+0x11/0x13
Apr  1 09:29:37 localhost kernel: [<c0605704>] acpi_notifier_call_chain+0x50/0x7c
Apr  1 09:29:37 localhost kernel: [<c061daee>] acpi_ac_notify+0x54/0x6d
Apr  1 09:29:37 localhost kernel: [<c05ff305>] acpi_device_notify+0x17/0x1a
Apr  1 09:29:37 localhost kernel: [<c060aa95>] acpi_ev_notify_dispatch+0x54/0x5f
Apr  1 09:29:37 localhost kernel: [<c05fcdbb>] acpi_os_execute_deferred+0x22/0x2d
Apr  1 09:29:37 localhost kernel: [<c0457a49>] worker_thread+0x140/0x1b9
Apr  1 09:29:37 localhost kernel: [<c05fcd99>] ? acpi_os_execute_deferred+0x0/0x2d
Apr  1 09:29:37 localhost kernel: [<c045b4d5>] ? autoremove_wake_function+0x0/0x34
Apr  1 09:29:37 localhost kernel: [<c0457909>] ? worker_thread+0x0/0x1b9
Apr  1 09:29:37 localhost kernel: [<c045b29d>] kthread+0x64/0x69
Apr  1 09:29:37 localhost kernel: [<c045b239>] ? kthread+0x0/0x69
Apr  1 09:29:37 localhost kernel: [<c0409ca7>] kernel_thread_helper+0x7/0x10
Apr  1 09:29:55 localhost abrt: Kerneloops: Reported 1 kernel oopses to Abrt
Apr  1 09:29:55 localhost abrtd: Directory 'kerneloops-1270139395-1' creation detected
Apr  1 09:29:55 localhost abrtd: Getting local universal unique identification
Apr  1 09:29:55 localhost abrtd: Crash is in database already (dup of /var/cache/abrt/kerneloops-1270139275-1)
Apr  1 09:29:55 localhost abrtd: Deleting crash kerneloops-1270139395-1 (dup of kerneloops-1270139275-1), sending dbus signal


I'm running on a Lenovo Thinkpad T500, model name      : Intel(R) Core(TM)2 Duo CPU     P8700  @ 2.53GHz

Comment 4 Need Real Name 2010-04-01 16:58:32 UTC
(In reply to comment #3)
> I am getting this frequently on FC 12, kernel
> 
> Linux vader 2.6.32.9-67.fc12.i686.PAE #1 SMP Sat Feb 27 09:42:55 UTC 2010 i686
> i686 i386 GNU/Linux
> 
> resulting in complete lockups of the GUI and system
> 
> Apr  1 09:29:37 localhost kernel: BUG: soft lockup - CPU#0 stuck for 61s!
> [kacpi_notify:23]
.
.

Lockup occurs a few minutes after every 3rd or 4th resume from hibernate...

Comment 5 cdprince 2010-04-07 22:18:28 UTC
I am getting this on FC10, from dmesg:

BUG: soft lockup - CPU#0 stuck for 129s! [clock-applet:2918]
Modules linked in: sit tunnel4 fuse sco bridge stp bnep l2cap bluetooth sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput snd_ens1371 gameport snd_rawmidi snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer pcnet32 snd ppdev soundcore snd_page_alloc parport_pc mptspi pcspkr mptscsih parport mptbase scsi_transport_spi i2c_piix4 mii i2c_core floppy ata_generic pata_acpi [last unloaded: microcode]

Pid: 2918, comm: clock-applet Not tainted (2.6.27.41-170.2.117.fc10.i686 #1) VMware Virtual Platform
EIP: 0073:[<0025520a>] EFLAGS: 00000282 CPU: 0
EIP is at 0x25520a
EAX: 0951bc20 EBX: 00277a94 ECX: 00000001 EDX: 09527f00
ESI: 09644800 EDI: 09509110 EBP: bf825d88 ESP: bf825d6c
 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
CR0: 80050033 CR2: aed40000 CR3: 151f1000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400


cpuinfo:
[root@srl1 svn]# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz
stepping	: 10
cpu MHz		: 2833.001
cache size	: 6144 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss nx constant_tsc up arch_perfmon pebs bts pni ssse3 sse4_1
bogomips	: 5666.00
clflush size	: 64
power management:

I am using a VM on VM Ware Workstation on a Vista VM Host
and I am seeing on the VM Host:
The device, \Device\Ide\iaStor0, did not respond within the timeout period.
in event viewer ... not sure if this is related and it doesnt look like it has the same frequency, but not too dissimilar.
4/7/2010 9:48:27 AM
4/5/2010 8:18:37 AM
3/31/2010 10:39:27 AM
3/16/2010 5:17:25 PM
As is the case a lot, the time is out of sync between the VM and VM Host.

I have seen lockups for as much as 800sec.
[root@srl1 svn]# grep -i 'BUG: soft lockup' /var/log/messages*
/var/log/messages-20100505:Mar 18 14:05:05 srl1 kernel: BUG: soft lockup - CPU#0 stuck for 298s! [gconfd-2:2895]
/var/log/messages-20100505:Mar 18 16:47:39 srl1 kernel: BUG: soft lockup - CPU#0 stuck for 216s! [setroubleshootd:2176]
/var/log/messages-20100505:Mar 19 04:04:10 srl1 kernel: BUG: soft lockup - CPU#0 stuck for 889s! [nautilus:2883]
/var/log/messages-20100505:May  5 19:46:07 srl1 kernel: BUG: soft lockup - CPU#0 stuck for 129s! [clock-applet:2918]
... this shows the frequency

Comment 6 Bug Zapper 2010-04-27 14:22:03 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Bug Zapper 2010-06-28 12:34:34 UTC
Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 8 Anand TS 2014-03-04 11:12:21 UTC
I got this while installing centos6.5 cloud image in to openstack.

bug soft lockup - cpu#0 stuck for 61s

I installed openstack in a CentOS6.5 vm which is inside ESXi 5.1 server.

because of this I can't login to the centos machine and it stuck in the error part.


Note You need to log in before you can comment on or make changes to this bug.