After booting up my hibernated laptop, I got the following trace very frequently..
BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
Modules linked in: b43 nls_utf8 vfat fat mmc_block tifm_ms memstick tifm_sd
cpufreq_stats aes_x86_64 aes_generic rfkill_input radeon drm fuse sunrpc i
pv6 nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter
ip_tables x_tables cpufreq_ondemand powernow_k8 freq_table loop dm_multipath
arc4 ecb crypto_blkcipher r
fkill mac80211 cfg80211 input_polldev joydev snd_atiixp_modem pcspkr k8temp
snd_atiixp snd_seq_dummy serio_raw snd_ac97_codec ac97_bus hwmon video
snd_seq_oss output snd_seq_midi_event
snd_seq snd_seq_device battery snd_pcm_oss 8139too firewire_ohci firewire_core
ac sdhci 8139cp snd_mixer_oss snd_pcm tifm_7xx1 crc_itu_t ssb mii mmc_core
button wmi tifm_core snd_time
r i2c_piix4 snd shpchp i2c_core soundcore snd_page_alloc sg sr_mod cdrom
dm_snapshot dm_zero dm_mirror dm_mod ata_generic pata_acpi pata_atiixp libata
sd_mod scsi_mod ext3 jbd mbcache
uhci_hcd ohci_hcd ehci_hcd [last unloaded: b43]
Pid: 0, comm: swapper Not tainted 2.6.25-1.fc9.x86_64 #1
RIP: 0010:[_spin_unlock_irqrestore+8/10] [_spin_unlock_irqrestore+8/10]
RSP: 0018:ffffffff81455db8 EFLAGS: 00000293
RAX: 0000000000000000 RBX: ffffffff81455db8 RCX: ffffffff81455db8
RDX: 00002cb3d104ee9e RSI: 0000000000000293 RDI: ffffffff81504220
RBP: ffffffff81455d48 R08: ffff8100010045b0 R09: 00000000005ad868
R10: ffff81000100bf80 R11: ffffffff81455eb8 R12: ffffffff8104ab83
R13: ffffffff81455d38 R14: ffff8100010045b0 R15: 00002cb29dc90dc0
FS: 00007f409a8f07a0(0000) GS:ffffffff813f2000(0000) knlGS:000000000846e830
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f4e03da9000 CR3: 0000000028d4b000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[tick_broadcast_oneshot_control+230/239] ? tick_broadcast_oneshot_control+0xe6/0xef
[tick_notify+482/821] ? tick_notify+0x1e2/0x335
[notifier_call_chain+51/91] ? notifier_call_chain+0x33/0x5b
[raw_notifier_call_chain+15/17] ? raw_notifier_call_chain+0xf/0x11
[clockevents_notify+43/92] ? clockevents_notify+0x2b/0x5c
[acpi_state_timer_broadcast+65/67] ? acpi_state_timer_broadcast+0x41/0x43
[acpi_idle_enter_simple+478/568] ? acpi_idle_enter_simple+0x1de/0x238
[cpuidle_idle_call+134/186] ? cpuidle_idle_call+0x86/0xba
[cpuidle_idle_call+0/186] ? cpuidle_idle_call+0x0/0xba
[default_idle+0/95] ? default_idle+0x0/0x5f
[cpu_idle+120/192] ? cpu_idle+0x78/0xc0
[rest_init+90/92] ? rest_init+0x5a/0x5c
Additionally, time moves really slowly. In ~six hours, the clock had only
Does playing with processor.max_cstate make any difference?
Similar trace in F8 bug 444282; reporter says booting with
processor.max_cstate=1 seems to fix the problem.
it's hard to say, as it typically behaves for ages, and then for some reason
gets into this state. The above message about it happening only after hibernate
turned out to be not true. It did it again when left completely idle overnight too.
I'll try limiting the C states, though previous kernels worked fine with all
three C states.
This is a dreaded ATI chipset that has had wonky timer handling in the past, but
F8 ran pretty solidly on it.
which kernel version ?
it's in the trace above..
Pid: 0, comm: swapper Not tainted 2.6.25-1.fc9.x86_64 #1
Which CPU / chipset is involved? I remember vaguely that we had a similar report
about 60sec stuck CPU somewhere. I try to dig it up.
chipset: ATI RS480
cpu: AMD Turion
I'm enthused :)
Can you please provide the output of
before and after resume ?
I can't seem to trigger it on demand.
clocksource is acpi_pm
timer_list before/after attached..
Created attachment 304044 [details]
timers pre suspending
Created attachment 304045 [details]
timers post suspending
*** Bug 444544 has been marked as a duplicate of this bug. ***
I should note that generally the system became unusable when the bug struck for
me.. I'm not sure if this was the case with you - your bug didn't mention that
Dave, any idea which kernel version was the last one which did not show the
somewhere in between 2.6.24 and 25 I'm guessing.
Bisecting this will be a real nightmare though because the bug won't repeat on
demand, sometimes it takes hours for it to show up.
I think this is the same bug I am hitting. Under heavy load the vmstat program
will segfault with a divide-by-zero error, then I get a "TSC unstable" message.
(This is on a uniprocessor ATI RS480 that doesn't support cpufreq.) After that
all hell breaks loose: programs won't make any progress unless I move the mouse
around and vmstat will consistently segfault with an (FP) divide-by-zero error.
I have a hard time to connect a clockevents/nohz bug with a vmstat divide by
zero error. There seems to be some more subtle wreckage involved.
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
'something' changed in .26-rc which fixes this. I've been running -rc3 for the
last 12 days on that laptop with no problems.
So, we'll get this fixed 'for free' when we rebase to .26, but it's going to be
a pain in the meantime to track down which cset is responsible for fixing it to
Sigh, now the AMD problems magicaly disappeared and the softlockup moved to
Intel based machines
Chuck, is the problem still there on your AMD box ?
(In reply to comment #20)
> Sigh, now the AMD problems magicaly disappeared and the softlockup moved to
> Intel based machines
> Chuck, is the problem still there on your AMD box ?
It's really hard to trigger on my system and it's still running F9 -- I need to
put a copy of rawhide on there or at least try the live CD.
Most definitely still there and I can trigger it reliably by a failed attempt to establish an IPSec tunnel. See bug #442920 for all details. BTW, this is an Intel based machine.
(In reply to comment #22)
> Most definitely still there and I can trigger it reliably by a failed attempt
> to establish an IPSec tunnel. See bug #442920 for all details. BTW, this is an
> Intel based machine.
After you get into a deadlock all kinds of crazy things can happen.
For what it's worth the problem seemed to have disappeared in the 184.108.40.206-29.fc9.x86_64 kernel, but I've started seeing similar symptoms (clock losing time, mouse response flaky, pc beeper "sticking on", etc) on the 220.127.116.11-45.fc9.x86_64 kernel.
Acer Ferrari 4000 laptop, x86_64, ATI 200M chipset. non-tainted kernel.
If this happens more today, I'll downgrade to the 18.104.22.168-29 kernel and complain here some more.
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '9'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 9's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 9 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.