Bug 444059

Summary: time moves very slowly.
Product: [Fedora] Fedora Reporter: Dave Jones <davej>
Component: kernelAssignee: Thomas Gleixner <tglx>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: low    
Version: 9CC: bojan, kernel-maint, pfrields, pgunn, pizza
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-14 16:37:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
timers pre suspending
none
timers post suspending none

Description Dave Jones 2008-04-24 20:00:10 UTC
After booting up my hibernated laptop, I got the following trace very frequently..

BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
CPU 0:
Modules linked in: b43 nls_utf8 vfat fat mmc_block tifm_ms memstick tifm_sd
cpufreq_stats aes_x86_64 aes_generic rfkill_input radeon drm fuse sunrpc i
pv6 nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp ipt_REJECT iptable_filter
ip_tables x_tables cpufreq_ondemand powernow_k8 freq_table loop dm_multipath
arc4 ecb crypto_blkcipher r
fkill mac80211 cfg80211 input_polldev joydev snd_atiixp_modem pcspkr k8temp
snd_atiixp snd_seq_dummy serio_raw snd_ac97_codec ac97_bus hwmon video
snd_seq_oss output snd_seq_midi_event
 snd_seq snd_seq_device battery snd_pcm_oss 8139too firewire_ohci firewire_core
ac sdhci 8139cp snd_mixer_oss snd_pcm tifm_7xx1 crc_itu_t ssb mii mmc_core
button wmi tifm_core snd_time
r i2c_piix4 snd shpchp i2c_core soundcore snd_page_alloc sg sr_mod cdrom
dm_snapshot dm_zero dm_mirror dm_mod ata_generic pata_acpi pata_atiixp libata
sd_mod scsi_mod ext3 jbd mbcache 
uhci_hcd ohci_hcd ehci_hcd [last unloaded: b43]
Pid: 0, comm: swapper Not tainted 2.6.25-1.fc9.x86_64 #1
RIP: 0010:[_spin_unlock_irqrestore+8/10]  [_spin_unlock_irqrestore+8/10]
_spin_unlock_irqrestore+0x8/0xa
RSP: 0018:ffffffff81455db8  EFLAGS: 00000293
RAX: 0000000000000000 RBX: ffffffff81455db8 RCX: ffffffff81455db8
RDX: 00002cb3d104ee9e RSI: 0000000000000293 RDI: ffffffff81504220
RBP: ffffffff81455d48 R08: ffff8100010045b0 R09: 00000000005ad868
R10: ffff81000100bf80 R11: ffffffff81455eb8 R12: ffffffff8104ab83
R13: ffffffff81455d38 R14: ffff8100010045b0 R15: 00002cb29dc90dc0
FS:  00007f409a8f07a0(0000) GS:ffffffff813f2000(0000) knlGS:000000000846e830
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f4e03da9000 CR3: 0000000028d4b000 CR4: 00000000000006a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

Call Trace:
 [tick_broadcast_oneshot_control+230/239] ? tick_broadcast_oneshot_control+0xe6/0xef
 [tick_notify+482/821] ? tick_notify+0x1e2/0x335
 [notifier_call_chain+51/91] ? notifier_call_chain+0x33/0x5b
 [raw_notifier_call_chain+15/17] ? raw_notifier_call_chain+0xf/0x11
 [clockevents_notify+43/92] ? clockevents_notify+0x2b/0x5c
 [acpi_state_timer_broadcast+65/67] ? acpi_state_timer_broadcast+0x41/0x43
 [acpi_idle_enter_simple+478/568] ? acpi_idle_enter_simple+0x1de/0x238
 [cpuidle_idle_call+134/186] ? cpuidle_idle_call+0x86/0xba
 [cpuidle_idle_call+0/186] ? cpuidle_idle_call+0x0/0xba
 [default_idle+0/95] ? default_idle+0x0/0x5f
 [cpu_idle+120/192] ? cpu_idle+0x78/0xc0
 [rest_init+90/92] ? rest_init+0x5a/0x5c

Additionally, time moves really slowly.  In ~six hours, the clock had only
advanced ~one.

Comment 1 Chuck Ebbert 2008-04-27 01:10:10 UTC
Does playing with processor.max_cstate make any difference?

Comment 2 Chuck Ebbert 2008-04-28 02:59:16 UTC
Similar trace in F8 bug 444282; reporter says booting with
processor.max_cstate=1 seems to fix the problem.

Comment 3 Dave Jones 2008-04-28 12:03:41 UTC
it's hard to say, as it typically behaves for ages, and then for some reason
gets into this state.  The above message about it happening only after hibernate
turned out to be not true. It did it again when left completely idle overnight too.

I'll try limiting the C states, though previous kernels worked fine with all
three C states.

This is a dreaded ATI chipset that has had wonky timer handling in the past, but
F8 ran pretty solidly on it.

Comment 4 Thomas Gleixner 2008-04-28 12:40:29 UTC
which kernel version ?

Comment 5 Dave Jones 2008-04-28 13:04:20 UTC
it's in the trace above..

Pid: 0, comm: swapper Not tainted 2.6.25-1.fc9.x86_64 #1


Comment 6 Thomas Gleixner 2008-04-28 14:18:24 UTC
-ENOTENOUGHCOFFEE :)

Which CPU / chipset is involved? I remember vaguely that we had a similar report
about 60sec stuck CPU somewhere. I try to dig it up.


Comment 7 Dave Jones 2008-04-28 18:48:22 UTC
chipset: ATI RS480
cpu: AMD Turion


Comment 8 Thomas Gleixner 2008-04-28 20:57:32 UTC
I'm enthused :)

Can you please provide the output of 
/proc/timer_list and
/sys/devices/system/clocksource/clocksource0/current_clocksource
before and after resume ?

Thanks,
       tglx


Comment 9 Dave Jones 2008-04-28 21:14:20 UTC
I can't seem to trigger it on demand. 
clocksource is acpi_pm

timer_list before/after attached..

Comment 10 Dave Jones 2008-04-28 21:15:12 UTC
Created attachment 304044 [details]
timers pre suspending

Comment 11 Dave Jones 2008-04-28 21:15:26 UTC
Created attachment 304045 [details]
timers post suspending

Comment 12 Pat Gunn 2008-05-02 22:59:18 UTC
*** Bug 444544 has been marked as a duplicate of this bug. ***

Comment 13 Pat Gunn 2008-05-02 23:12:18 UTC
I should note that generally the system became unusable when the bug struck for
me.. I'm not sure if this was the case with you - your bug didn't mention that
specifically

Comment 14 Thomas Gleixner 2008-05-03 06:43:10 UTC
Dave, any idea which kernel version was the last one which did not show the
problem ?

Comment 15 Dave Jones 2008-05-03 13:58:10 UTC
somewhere in between 2.6.24 and 25 I'm guessing. 
Bisecting this will be a real nightmare though because the bug won't repeat on
demand, sometimes it takes hours for it to show up.


Comment 16 Chuck Ebbert 2008-05-06 02:25:20 UTC
I think this is the same bug I am hitting. Under heavy load the vmstat program
will segfault with a divide-by-zero error, then I get a "TSC unstable" message.
(This is on a uniprocessor ATI RS480 that doesn't support cpufreq.) After that
all hell breaks loose: programs won't make any progress unless I move the mouse
around and vmstat will consistently segfault with an (FP) divide-by-zero error.

Comment 17 Thomas Gleixner 2008-05-06 20:22:17 UTC
I have a hard time to connect a clockevents/nohz bug with a vmstat divide by
zero error. There seems to be some more subtle wreckage involved.

Comment 18 Bug Zapper 2008-05-14 10:06:50 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 19 Dave Jones 2008-06-04 15:45:08 UTC
'something' changed in .26-rc which fixes this.  I've been running -rc3 for the
last 12 days on that laptop with no problems.

So, we'll get this fixed 'for free' when we rebase to .26, but it's going to be
a pain in the meantime to track down which cset is responsible for fixing it to
backport.

Comment 20 Thomas Gleixner 2008-06-07 14:37:50 UTC
Sigh, now the AMD problems magicaly disappeared and the softlockup moved to
Intel based machines

http://www.kerneloops.org/guilty.php?guilty=__do_softirq&version=2.6.26-rc&start=1703936&end=1736703&class=oops

Chuck, is the problem still there on your AMD box ?


Comment 21 Chuck Ebbert 2008-06-10 03:22:19 UTC
(In reply to comment #20)
> Sigh, now the AMD problems magicaly disappeared and the softlockup moved to
> Intel based machines
> 
>
http://www.kerneloops.org/guilty.php?guilty=__do_softirq&version=2.6.26-rc&start=1703936&end=1736703&class=oops
> 
> Chuck, is the problem still there on your AMD box ?
> 

It's really hard to trigger on my system and it's still running F9 -- I need to
put a copy of rawhide on there or at least try the live CD.


Comment 22 Bojan Smojver 2008-09-25 01:20:21 UTC
Most definitely still there and I can trigger it reliably by a failed attempt to establish an IPSec tunnel. See bug #442920 for all details. BTW, this is an Intel based machine.

Comment 23 Chuck Ebbert 2008-09-30 04:46:41 UTC
(In reply to comment #22)
> Most definitely still there and I can trigger it reliably by a failed attempt
> to establish an IPSec tunnel. See bug #442920 for all details. BTW, this is an
> Intel based machine.

After you get into a deadlock all kinds of crazy things can happen.

Comment 24 Solomon Peachy 2008-10-06 12:54:21 UTC
For what it's worth the problem seemed to have disappeared in the 2.6.26.3-29.fc9.x86_64 kernel, but I've started seeing similar symptoms (clock losing time, mouse response flaky, pc beeper "sticking on", etc) on the 2.6.26.5-45.fc9.x86_64 kernel. 

Acer Ferrari 4000 laptop, x86_64, ATI 200M chipset.  non-tainted kernel.

If this happens more today, I'll downgrade to the 2.6.26.3-29 kernel and complain here some more.

Comment 25 Bug Zapper 2009-06-10 00:25:17 UTC
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 26 Bug Zapper 2009-07-14 16:37:43 UTC
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.