Bug 231626

Summary: Server fails to keep time.
Product: [Fedora] Fedora Reporter: Joseph A. Farmer <jfarmer99>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: low Docs Contact:
Priority: medium    
Version: 6CC: johnstul, mlichvar
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.22.9-61.fc6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-17 17:01:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/log/dmesg
none
DMesg none

Description Joseph A. Farmer 2007-03-09 16:25:36 UTC
Description of problem:  I have a Dell Dimension that loses time.  Seriously. 
Last night, after about a week, it was 4 days behind.  The "time" applet was
added to the panel and set to use Fedora time servers.  For whatever reason it
just falls behind.  If I select "adjust time" it connects and updates the time.
 Then it will lose time until I do that again.  Messages log from last "manual
adjust time" update:

Mar  9 00:34:14 grant ntpd[31978]: ntpd exiting on signal 15
Mar  9 09:13:51 grant ntpdate[3138]: step time server 217.150.242.8 offset
31167.903554 sec
Mar  9 09:13:51 grant ntpd[3140]: ntpd 4.2.4 Mon Jan 29 13:58:04 UTC
2007 (1)
Mar  9 09:13:51 grant ntpd[3141]: precision = 1.000 usec
Mar  9 09:13:51 grant ntpd[3141]: Listening on interface #0 wildcard,
0.0.0.0#123 Disabled
Mar  9 09:13:51 grant ntpd[3141]: Listening on interface #1 wildcard, ::#123
Disabled
Mar  9 09:13:51 grant ntpd[3141]: Listening on interface #2 lo, ::1#123 Enabled
Mar  9 09:13:51 grant ntpd[3141]: Listening on interface #3 eth0,
fe80::213:72ff:fec7:d969#123 Enabled
Mar  9 09:13:51 grant ntpd[3141]: Listening on interface #4 lo, 127.0.0.1#123
Enabled
Mar  9 09:13:51 grant ntpd[3141]: Listening on interface #5 eth0,
192.168.0.2#123 Enabled
Mar  9 09:13:51 grant ntpd[3141]: kernel time sync status 0040
Mar  9 09:13:51 grant ntpd[3141]: frequency initialized 56.842 PPM from
/var/lib/ntp/drift


Version-Release number of selected component (if applicable):  Don't know.  I
run yum update from time to time so probably whatever is current.


How reproducible: It's perpetual.


Steps to Reproduce:
1. Set time server to use Fedora NTP servers.  Watch time drift immediately.
2. Uncheck NTP servers, time still drifts.
3.
  
Actual results:  We're traveling slowly though time.  If the rest of my life did
that I guess I'd be ok with it.


Expected results:  Computer should keep time.


Additional info:

Comment 1 Miroslav Lichvar 2007-03-12 09:54:39 UTC
Is there more output from ntpd in /var/log/messages, when the daemon is running
for few hours?

Also, output from /usr/sbin/ntpq -pn would be helpful.

Comment 2 Joseph A. Farmer 2007-03-12 21:57:14 UTC
There is no other information in /var/log/messages.

[root@grant ~]# /usr/sbin/ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 193.45.254.143  212.94.162.33    3 u   30   64  377   89.962  4137499 316160.
 202.234.64.222  240.195.23.255   2 u   30   64  377  131.588  4137501 316611.
 217.172.188.162 193.120.142.71   2 u   18   64  377   72.697  4138816 317799.

I don't know what those numbers mean.

Since I filed the but it lost 12 hours.  I just reset it.

Comment 3 Miroslav Lichvar 2007-03-13 08:39:10 UTC
This really looks like a hw or kernel problem, probably lost interrupts.
Reassigning to kernel component.

Comment 4 Chuck Ebbert 2007-03-13 15:33:29 UTC
Please attach the file /var/log/dmesg



Comment 5 Joseph A. Farmer 2007-03-13 16:37:51 UTC
Created attachment 149947 [details]
/var/log/dmesg

Comment 6 Joseph A. Farmer 2007-03-13 16:39:59 UTC
Further info:
The machine was fine with FC5.  When it was upgrades to FC6 it was fine for over
a month.  Autoupdate was turned off.  When I ran yum update is when it started
losing time.

Comment 7 Chuck Ebbert 2007-03-14 21:48:35 UTC
Please test kernel 2.6.20-1.2925.fc6

Comment 8 Joseph A. Farmer 2007-03-15 02:52:01 UTC
Kernel updated, no change:

[root@grant ~]# /usr/sbin/ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 193.218.127.251 131.188.3.221    2 u   15   64  377  125.217  711734. 316200.
 193.125.143.140 192.36.143.150   2 u   12   64  377   74.017  715045. 316877.
 192.36.143.151  .PPS.            1 u    7   64  377   64.790  720534. 318072.
[root@grant ~]# uname -a
Linux grant.happykilroy.com 2.6.20-1.2925.fc6 #1 SMP Sat Mar 10 19:15:16 EST
2007 i686 i686 i386 GNU/Linux
[root@grant ~]# 



Comment 9 Dave Jones 2007-03-19 19:43:07 UTC
just a hunch, but can you try booting with hpet=off ?


Comment 10 Joseph A. Farmer 2007-03-19 21:54:32 UTC
default=0
timeout=5
splashimage=(hd0,0)/boot/grub/splash.xpm.gz
hiddenmenu
title Fedora Core (2.6.20-1.2925.fc6)
        root (hd0,0)
        kernel /boot/vmlinuz-2.6.20-1.2925.fc6 ro root=LABEL=/ rhgb quiet hpet=off
        initrd /boot/initrd-2.6.20-1.2925.fc6.img
title Fedora Core (2.6.19-1.2911.fc6)
        root (hd0,0)
        kernel /boot/vmlinuz-2.6.19-1.2911.fc6 ro root=LABEL=/ rhgb quiet
        initrd /boot/initrd-2.6.19-1.2911.fc6.img

I assume that is where the hpet=off goes?

[root@grant ~]# uname -a
Linux grant.happykilroy.com 2.6.20-1.2925.fc6 #1 SMP Sat Mar 10 19:15:16 EST
2007 i686 i686 i386 GNU/Linux

So I'm booting the kernel with that option.

Still a reference to hpet in dmesg:


Enabling APIC mode:  Flat.  Using 1 I/O APICs
ACPI: HPET id: 0x8086a201 base: 0xfed00000
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at c4000000 (gap: c0000000:30000000)
Detected 5864.670 MHz processor.
Built 1 zonelists.  Total pages: 776352
Kernel command line: ro root=LABEL=/ rhgb quiet hpet=off
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)


[root@grant ~]# ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 216.165.129.244 .GPS.            1 u   22   64    7    7.626  142967. 112662.
[root@grant ~]# 


Computer still can't tell time.  I seem to think it's the applet.  If I select
"adjust time" manually it updates it correctly.  Without doing that it leaks
time like a sieve.

Perhaps I'm wrong, I often am.







Comment 11 john stultz 2007-03-23 22:45:22 UTC
Does booting w/ "clocksource=acpi_pm" work around the issue?

Also do you know what was the last working version of the kernel you used?

Comment 12 Joseph A. Farmer 2007-03-24 01:42:41 UTC
IBM huh?  In this case that would be an abbreviation for "I bring mitigation." 
Removing the HPET entry and adding the acpi_pm fixed it.

As to Kernel I really can't tell exactly which it was.  I know that .19 and .20
were both bad.  I have yum retaining the cache and these are on the machine:

kernel-2.6.18-1.2849.fc6.i586.rpm
kernel-2.6.18-1.2849.fc6.i686.rpm
kernel-2.6.18-1.2869.fc6.i586.rpm
kernel-2.6.18-1.2869.fc6.i686.rpm
kernel-2.6.19-1.2895.fc6.i586.rpm
kernel-2.6.19-1.2911.6.4.fc6.i686.rpm
kernel-2.6.19-1.2911.fc6.i586.rpm
kernel-2.6.20-1.2925.fc6.i686.rpm

The i586 are "cruft" from the FC6 install problem I'd think.

.2911 and .2925 are the only two installed atm.  So 2869 or 2895 was the last
working kernel.  I tend to think it was the 2869 as there is also a PAE kernel
(2849) and I recall switching off that and it was still working for some time. 
So that would be 2869 working and 2895 being the most likely culprit.  2911 and
2925 are both known bad.

Comment 13 john stultz 2007-03-24 01:59:03 UTC
Ok. So it looks like your system is normally selecting the TSC for its
clocksource, however your TSC seems to be either changing frequency or halting
in C3 idle (which should normally disqualify the TSC and force the ACPI PM to be
used by default).

Quick test: does removing the "clocksource=acpi_pm" and adding "idle=poll" avoid
the issue?

Regarding the last kernel that functioned properly, 2.6.18 first introduced the
new timekeeping code, so if the problem was not in 2.6.18 but is in 2.6.19, that
means its some subtle change that I'm not aware of. Hmm.

Comment 14 Joseph A. Farmer 2007-03-28 04:12:08 UTC
Sorry for the delay.  I removed clocksource=acpi_pm and added idle=poll to grub.
 The time problem returned and it was immediate.  Lost 1/2 hour in the hour I
let it run.

Regards the bug, it's still there but using acpi_pm solves my issue.  Close the
bug?  Leave it open to track down the problem?  I'm willing to help of course.

Your call.



Comment 15 john stultz 2007-04-03 00:21:04 UTC
From the dmesg above:
   Detected 5863.607 MHz processor.

I doubt you have a ~6Ghz cpu. Sounds like the cpu_khz calibration is going
wonky, possibly due to the PIT mis-behaving.


Comment 16 john stultz 2007-04-04 03:58:28 UTC
Does booting w/ noapic (and not using the clocksource=acpi_pm option) change the
behavior?

Comment 17 Joseph A. Farmer 2007-04-04 05:33:00 UTC
Removed acpi_pm and added noapic.  Started bleeding time immediately.  

Apr  4 00:11:27 grant kernel: ACPI: Skipping IOAPIC probe due to 'noapic' option.
Apr  4 00:11:27 grant kernel: Kernel command line: ro root=LABEL=/ rhgb quiet noapic

I can attach the full dmesg if required.

Comment 18 john stultz 2007-10-08 18:52:38 UTC
Joseph: Does this timekeeping issue still exist with the current fedora kernel?

Comment 19 Joseph A. Farmer 2007-10-16 23:00:31 UTC
Good question.  I don't know.  The latest fedora kernel would be for Fedora 7. 
This machine, a bog standard Dell Dimension 5150, will not boot the Fedora 7
DVD.  I'll see if there is a respin.  I tried to install Fedora 7 right after
release but the machine would just hang on booting the DVD.  I suspect that has
something to do with the replacement of the IDE with SATA.  This machine has an
IDE DVD but SATA hard drives.

Comment 20 Chuck Ebbert 2007-10-16 23:08:02 UTC
Fedora 6 and 7 have basically the same kernel, both are based on 2.6.22.9.


Comment 21 Joseph A. Farmer 2007-10-16 23:43:52 UTC
Created attachment 229311 [details]
DMesg

Comment 22 Joseph A. Farmer 2007-10-16 23:47:29 UTC
Dang it, adding the attachment ate my message...

Thank your for the kernel info.  I yummed it.

=====================

#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/boot/grub/splash.xpm.gz
hiddenmenu
title Fedora Core (2.6.22.9-61.fc6)
        root (hd0,0)
        kernel /boot/vmlinuz-2.6.22.9-61.fc6 ro root=LABEL=/ rhgb quiet
        initrd /boot/initrd-2.6.22.9-61.fc6.img
title Fedora Core (2.6.20-1.2925.fc6)
        root (hd0,0)
        kernel /boot/vmlinuz-2.6.20-1.2925.fc6 ro root=LABEL=/ rhgb quiet
clocksource=acpi_pm
        initrd /boot/initrd-2.6.20-1.2925.fc6.img

===========================

[root@grant ~]# uname -a
Linux grant.happykilroy.com 2.6.22.9-61.fc6 #1 SMP Thu Sep 27 18:48:03 EDT 2007
i686 i686 i386 GNU/Linux

==========================

[root@grant ~]# uptime
 18:46:08 up 19 min,  2 users,  load average: 0.03, 0.10, 0.10

==========================

[root@grant ~]# uptime
 18:46:08 up 19 min,  2 users,  load average: 0.03, 0.10, 0.10
[root@grant ~]# /usr/sbin/ntpq -pn
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*216.165.129.244 .GPS.            1 u   14   64  377   17.495  243.945  60.926

=========================

The jitter is growning over time buy slowly.  The games play fine.  This kernel
seems to fix the problem.

Thanks.