After booting up my hibernated laptop, I got the following trace very frequently.. BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] CPU 0: Modules linked in: b43 nls_utf8 vfat fat mmc_block tifm_ms memstick tifm_sd cpufreq_stats aes_x86_64 aes_generic rfkill_input radeon drm fuse sunrpc i pv6 nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter ip_tables x_tables cpufreq_ondemand powernow_k8 freq_table loop dm_multipath arc4 ecb crypto_blkcipher r fkill mac80211 cfg80211 input_polldev joydev snd_atiixp_modem pcspkr k8temp snd_atiixp snd_seq_dummy serio_raw snd_ac97_codec ac97_bus hwmon video snd_seq_oss output snd_seq_midi_event snd_seq snd_seq_device battery snd_pcm_oss 8139too firewire_ohci firewire_core ac sdhci 8139cp snd_mixer_oss snd_pcm tifm_7xx1 crc_itu_t ssb mii mmc_core button wmi tifm_core snd_time r i2c_piix4 snd shpchp i2c_core soundcore snd_page_alloc sg sr_mod cdrom dm_snapshot dm_zero dm_mirror dm_mod ata_generic pata_acpi pata_atiixp libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: b43] Pid: 0, comm: swapper Not tainted 2.6.25-1.fc9.x86_64 #1 RIP: 0010:[_spin_unlock_irqrestore+8/10] [_spin_unlock_irqrestore+8/10] _spin_unlock_irqrestore+0x8/0xa RSP: 0018:ffffffff81455db8 EFLAGS: 00000293 RAX: 0000000000000000 RBX: ffffffff81455db8 RCX: ffffffff81455db8 RDX: 00002cb3d104ee9e RSI: 0000000000000293 RDI: ffffffff81504220 RBP: ffffffff81455d48 R08: ffff8100010045b0 R09: 00000000005ad868 R10: ffff81000100bf80 R11: ffffffff81455eb8 R12: ffffffff8104ab83 R13: ffffffff81455d38 R14: ffff8100010045b0 R15: 00002cb29dc90dc0 FS: 00007f409a8f07a0(0000) GS:ffffffff813f2000(0000) knlGS:000000000846e830 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00007f4e03da9000 CR3: 0000000028d4b000 CR4: 00000000000006a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [tick_broadcast_oneshot_control+230/239] ? tick_broadcast_oneshot_control+0xe6/0xef [tick_notify+482/821] ? tick_notify+0x1e2/0x335 [notifier_call_chain+51/91] ? notifier_call_chain+0x33/0x5b [raw_notifier_call_chain+15/17] ? raw_notifier_call_chain+0xf/0x11 [clockevents_notify+43/92] ? clockevents_notify+0x2b/0x5c [acpi_state_timer_broadcast+65/67] ? acpi_state_timer_broadcast+0x41/0x43 [acpi_idle_enter_simple+478/568] ? acpi_idle_enter_simple+0x1de/0x238 [cpuidle_idle_call+134/186] ? cpuidle_idle_call+0x86/0xba [cpuidle_idle_call+0/186] ? cpuidle_idle_call+0x0/0xba [default_idle+0/95] ? default_idle+0x0/0x5f [cpu_idle+120/192] ? cpu_idle+0x78/0xc0 [rest_init+90/92] ? rest_init+0x5a/0x5c Additionally, time moves really slowly. In ~six hours, the clock had only advanced ~one.
Does playing with processor.max_cstate make any difference?
Similar trace in F8 bug 444282; reporter says booting with processor.max_cstate=1 seems to fix the problem.
it's hard to say, as it typically behaves for ages, and then for some reason gets into this state. The above message about it happening only after hibernate turned out to be not true. It did it again when left completely idle overnight too. I'll try limiting the C states, though previous kernels worked fine with all three C states. This is a dreaded ATI chipset that has had wonky timer handling in the past, but F8 ran pretty solidly on it.
which kernel version ?
it's in the trace above.. Pid: 0, comm: swapper Not tainted 2.6.25-1.fc9.x86_64 #1
-ENOTENOUGHCOFFEE :) Which CPU / chipset is involved? I remember vaguely that we had a similar report about 60sec stuck CPU somewhere. I try to dig it up.
chipset: ATI RS480 cpu: AMD Turion
I'm enthused :) Can you please provide the output of /proc/timer_list and /sys/devices/system/clocksource/clocksource0/current_clocksource before and after resume ? Thanks, tglx
I can't seem to trigger it on demand. clocksource is acpi_pm timer_list before/after attached..
Created attachment 304044 [details] timers pre suspending
Created attachment 304045 [details] timers post suspending
*** Bug 444544 has been marked as a duplicate of this bug. ***
I should note that generally the system became unusable when the bug struck for me.. I'm not sure if this was the case with you - your bug didn't mention that specifically
Dave, any idea which kernel version was the last one which did not show the problem ?
somewhere in between 2.6.24 and 25 I'm guessing. Bisecting this will be a real nightmare though because the bug won't repeat on demand, sometimes it takes hours for it to show up.
I think this is the same bug I am hitting. Under heavy load the vmstat program will segfault with a divide-by-zero error, then I get a "TSC unstable" message. (This is on a uniprocessor ATI RS480 that doesn't support cpufreq.) After that all hell breaks loose: programs won't make any progress unless I move the mouse around and vmstat will consistently segfault with an (FP) divide-by-zero error.
I have a hard time to connect a clockevents/nohz bug with a vmstat divide by zero error. There seems to be some more subtle wreckage involved.
Changing version to '9' as part of upcoming Fedora 9 GA. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
'something' changed in .26-rc which fixes this. I've been running -rc3 for the last 12 days on that laptop with no problems. So, we'll get this fixed 'for free' when we rebase to .26, but it's going to be a pain in the meantime to track down which cset is responsible for fixing it to backport.
Sigh, now the AMD problems magicaly disappeared and the softlockup moved to Intel based machines http://www.kerneloops.org/guilty.php?guilty=__do_softirq&version=2.6.26-rc&start=1703936&end=1736703&class=oops Chuck, is the problem still there on your AMD box ?
(In reply to comment #20) > Sigh, now the AMD problems magicaly disappeared and the softlockup moved to > Intel based machines > > http://www.kerneloops.org/guilty.php?guilty=__do_softirq&version=2.6.26-rc&start=1703936&end=1736703&class=oops > > Chuck, is the problem still there on your AMD box ? > It's really hard to trigger on my system and it's still running F9 -- I need to put a copy of rawhide on there or at least try the live CD.
Most definitely still there and I can trigger it reliably by a failed attempt to establish an IPSec tunnel. See bug #442920 for all details. BTW, this is an Intel based machine.
(In reply to comment #22) > Most definitely still there and I can trigger it reliably by a failed attempt > to establish an IPSec tunnel. See bug #442920 for all details. BTW, this is an > Intel based machine. After you get into a deadlock all kinds of crazy things can happen.
For what it's worth the problem seemed to have disappeared in the 2.6.26.3-29.fc9.x86_64 kernel, but I've started seeing similar symptoms (clock losing time, mouse response flaky, pc beeper "sticking on", etc) on the 2.6.26.5-45.fc9.x86_64 kernel. Acer Ferrari 4000 laptop, x86_64, ATI 200M chipset. non-tainted kernel. If this happens more today, I'll downgrade to the 2.6.26.3-29 kernel and complain here some more.
This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.