Bug 189688

Summary: Clock error which prevents ntpd from functioning.
Product: [Fedora] Fedora Reporter: Michael Godfrey <godfrey>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 5CC: ehud, pfrields, wilburn, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-21 08:00:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Output of ntpq, /proc/cpuinfo, and demsg none

Description Michael Godfrey 2006-04-23 03:30:04 UTC
Description of problem:
Starting with kernel-2.6.16-1.2069_FC4 the clock on this system
gains about 4 seconds per 10 minutes. This also happens on 
kernel-2.6.16-1.2096_FC4.  Booting back to kernel-2.6.15-1.1833_FC4
corrects the problem.

Under kernel-2.6.16-1.2069_FC4 ntpd never syncs with the time server since
the system time diverges so quickly from the reset value when ntpd is
started at boot time. No experimenting with the drift value or other config
parameters did any good.

Other systems, using other CPUs and motherboards do not show this problem.

The attachment shows some details and the output of ntpq and demsg.

Version-Release number of selected component (if applicable):
kernel-2.6.16-1.2069_FC4 and kernel-2.6.16-1.2096_FC4.

How reproducible:
Always

Steps to Reproduce:
1. Boot kernel 2096
2. run date
3. observe time speedup of 4sec per 10 minutes and check
   /var/log/messages to see that ntpd never syncs with time server.
  
Actual results:
Time divergese from correct value.

Expected results:
ntpd maintains correct time.

Additional info: This may be related to Bug 18805, but the kernel versions
and symptoms are different.

Comment 1 Michael Godfrey 2006-04-23 03:30:04 UTC
Created attachment 128121 [details]
Output of ntpq, /proc/cpuinfo, and demsg

Comment 2 Michael Godfrey 2006-04-24 17:03:29 UTC
The motherboard for this system is an ASUS A7N8X-E.


Comment 3 wilburn 2006-04-24 20:50:18 UTC
I see this also for an ASUS motherboard with FC5. Currently I am running
2.6.16-1.2096_FC5, but have seen this with all FC5 kernels I have used. Did not
see it with FC4, but don't know the last kernel I ran with it. In addition,
jitter reported by ntpq -p seems much too large:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 tokaimura.lysat 130.236.254.17   2 u  337 1024  375  257.160  -100775 8402.61
 lampa.logoslink 213.59.0.3       3 u  257 1024  377  298.147  -100811 4228.12
 ntp02.oal.ul.pt 194.117.9.129    2 u  263 1024  377  314.088  -100808 4244.46
*LOCAL(0)        LOCAL(0)        10 l   21   64  377    0.000    0.000   0.004

Compare this to another FC5 machine (HP motherboard) on the same net that
doesn't show this problem:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*zirkon.biophys. 134.99.128.80    2 u  438 1024  377  236.270  -19.189   1.187
+av10.avenue.com 198.123.30.132   2 u  400 1024  337  304.917   19.851   2.254
+ns.pf.uni-c.dk  193.162.145.130  2 u  394 1024  377  260.848   13.655   0.214
 LOCAL(0)        LOCAL(0)        10 l    1   64  377    0.000    0.000   0.004
[

Comment 4 wilburn 2006-05-08 20:49:07 UTC
Booting with the kernel option 'noapic' appears to solve this problem for me.
Here is the output of ntpq -q now (compare to first machine in previous messages)

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 pd168.bielsko.s .STEP.          16 u    - 1024    0    0.000    0.000 4000.00
+tasc9229.cs.sfu 209.81.9.7       2 u    1   64  377  138.913  -23.258   2.192
*tweety.celuloza 80.96.120.249    2 u   51   64  377  273.604  -27.724   5.132
 LOCAL(0)        LOCAL(0)        10 l   62   64  377    0.000    0.000   0.004

I am will to perform further diagnostics if anyone asks.

Comment 5 Ehud Karni 2006-08-01 10:36:51 UTC
I think this is bug in glibc (seen on FC5, not FC4, because of glibc versions).
Can you please check and report glibc version and date (do ls -al /lib/libc.so.6
and check name and time of linked libc-2.4*).
I'll post my toughts and findings after seeing your data.

Comment 6 wilburn 2006-08-01 14:40:38 UTC
# rpm -q glibc
glibc-2.4-8

# ls -al /lib/libc.so.6
lrwxrwxrwx 1 root root 11 May 15 06:24 /lib/libc.so.6 -> libc-2.4.so

ls -al /lib/libc-2.4.so
-rwxr-xr-x 1 root root 1532536 May 12 07:09 /lib/libc-2.4.so

However, glibc was updated May 15. I will have to reboot without the 'noapic'
option and watch for a few hours to see if the problem still exists. I will
report back when this is done.

Comment 7 Ehud Karni 2006-08-01 15:40:59 UTC
You may check, of course, but it will almost surely fail.
The problem is with the May 12 glibc (version 2.4.8).
To overcome the problem I extracted the glibc-2.4.4.i686.rpm and
glibc-common-2.4-4.i386.rpm from the FC5 distribution (they are on the 1st disc)
and then run 
    rpm --install --force glibc-2.4.4.i686.rpm glibc-common-2.4-4.i386.rpm. This
fixed the problem.

I had it on 4 machines that were installed from the "Fedora Core 5 Re-Spin
20060523" DVD, all suffered from this problem. I searched for bug reports and
found none. On one machine I reinstalled FC5 from the original CDs (2006-03-15)
and updated the various packages by yum. Several hours after updating to 
glibc-2.4.8, this bug hit me again.

To correct this I forced install of glibc-2.4.4 again. This is not enough for
the FC5 Re-spin, because the libc.so.6 is not linked to libc-2.4.so but to
libc-2.4.90.so . I ran the script below, and this solved it (none of the
affected machine had the bug in 4 days).


#! /bin/sh -ex

rlnk ()
{
   mv $1 ${1}-old
   ln -s $2 $1
}

cd /lib

rlnk libBrokenLocale.so.1      libBrokenLocale-2.4.so
rlnk libanl.so.1               libanl-2.4.so
rlnk libcidn.so.1              libcidn-2.4.so
rlnk libcrypt.so.1             libcrypt-2.4.so
rlnk libdl.so.2                libdl-2.4.so
rlnk libext2fs.so.2            libext2fs.so.2.4
rlnk libm.so.6                 libm-2.4.so
rlnk libnsl.so.1               libnsl-2.4.so
rlnk libnss_compat.so.2        libnss_compat-2.4.so
rlnk libnss_dns.so.2           libnss_dns-2.4.so
rlnk libnss_files.so.2         libnss_files-2.4.so
rlnk libnss_hesiod.so.2        libnss_hesiod-2.4.so
rlnk libnss_nis.so.2           libnss_nis-2.4.so
rlnk libnss_nisplus.so.2       libnss_nisplus-2.4.so
rlnk libpthread.so.0           libpthread-2.4.so
rlnk libresolv.so.2            libresolv-2.4.so
rlnk librt.so.1                librt-2.4.so
rlnk libutil.so.1              libutil-2.4.so

cp -a libc.so.6 libc.so.6-old
ln -s -f libc-2.4.so libc.so.6

############################## lnk.sh ##############################


Comment 8 Ehud Karni 2006-08-02 20:48:41 UTC
I spoke too soon. I was hit with the bug on the 2 FC5 Re-spin systems. My script
above had missed ld-linux.so.2 which seems to be crucial. Its linking must be
done like that of libc.so.6 i.e.:

cp -a ld-linux.so.2 ld-linux.so.2-old
ln -s -f ld-2.4.so ld-linux.so.2


Comment 9 Ehud Karni 2006-08-07 13:24:02 UTC
I was totaly wrong. The original FC5 also has this bug, I'm moving back to FC4
(I tried the noapic boot parameter but I had other problems).

The 4 systems that had demonstared the bug are Pentium 4 and have this cpuinfo:
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping        : 9
cpu MHz         : 2793.460
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr
bogomips        : 3137.53


A 5th system which runs with FC5 without the bug:
vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      : Intel(R) Xeon(TM) CPU 3.00GHz
stepping        : 3
cpu MHz         : 2993.039
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni
monitor ds_cpl cid cx16 xtpr
bogomips        : 5991.41


Comment 10 Dave Jones 2006-09-17 02:52:06 UTC
[This comment added as part of a mass-update to all open FC4 kernel bugs]

FC4 has now transitioned to the Fedora legacy project, which will continue to
release security related updates for the kernel.  As this bug is not security
related, it is unlikely to be fixed in an update for FC4, and has been migrated
to FC5.

Please retest with Fedora Core 5.

Thank you.


Comment 11 Dave Jones 2006-10-16 21:50:17 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 12 wilburn 2006-10-18 14:38:48 UTC
kernel-2.6.18-1.2200.fc5 fixes the problem for me. I can now boot without the
'noapic' option and the clock runs fine.