83153 – ntp won't sync to external servers after upgrade from 7.2 to 8.0

Bug 83153 - ntp won't sync to external servers after upgrade from 7.2 to 8.0

Summary: ntp won't sync to external servers after upgrade from 7.2 to 8.0

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	ntp
Sub Component:
Version:	8.0
Hardware:	athlon
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Harald Hoyer
QA Contact:	Brian Brock
Docs Contact:
URL:	http://groups.google.com/groups?dq=&h...
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2003-01-30 19:58 UTC by Joseph Shraibman
Modified:	2007-04-18 16:50 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2003-11-05 07:54:28 UTC
Embargoed:

Attachments	(Terms of Use)

Description Joseph Shraibman 2003-01-30 19:58:08 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021130

Description of problem:
After upgrading a rh 7.2 dual athlon server to 8.0 ntp won't synchronize with
external servers because it detects a high jitter, even with a server next to it
on the network.  It has been pointed out in comp.protocols.time.ntp that the
redhat 8.0 release notes contain:
<blockquote>
HZ=512 on i686 and Athlon means that the system clock ticks 5 times as fast as
on other x86 platforms (i386 and i586); HZ=100 has been the Linux default on x86
platforms for the entire history of the Linux kernel. This change provides
better interactive response, lower latency response from some programs, and
better response from the scheduler. We have adjusted the /proc file system to
report numbers as if using the default HZ=100.
</blockquote>
This may be what is causing ntp to detect this high jitter:

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*LOCAL(0)        LOCAL(0)        10 l    2   64  377    0.000    0.000   0.008
 otc2.psu.edu    ntp2.usno.navy.  2 u   60   64  377  156.826  10325.1 1415.94
 proxy.cc.vt.edu gps1.tns.its.ps  2 u    1   64  377   13.307  11522.6 1369.69
 p1.selectacast. otc2.psu.edu     3 u   62   64  377    0.189  10237.7 1453.45

The last one is next to it on the network.  A similar dual xeon machine that was
upgraded from 7.2 to 8.0 does not show the same problem. A really bad hardware
clock may contribute to the problem, but it worked fine under 7.2


Version-Release number of selected component (if applicable):
ntp-4.1.1a-9

How reproducible:
Always

Steps to Reproduce:
1. Try to run ntpd on a dual athlon machine

    

Actual Results:  ntp chose to sync with itself instead of any of the external
machines.

Expected Results:  ntp should have synced with the external machines.

Additional info:

This is a tyan motherboard.
<pre>
# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 6
model name      : AMD Athlon(tm) MP 1800+
stepping        : 2
cpu MHz         : 1526.422
cache size      : 256 KB
Physical processor ID   : 0
Number of siblings      : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 3038.00
 
processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 6
model name      : AMD Athlon(tm) Processor
stepping        : 2
cpu MHz         : 1526.422
cache size      : 256 KB
Physical processor ID   : 0
Number of siblings      : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 3046.10
</pre>
I'm marking this as high because the bad clock is causing me serious problems.

Comment 1 Harald Hoyer 2003-01-31 09:51:57 UTC

Do you use -x ? How does /etc/ntp.conf, /etc/sysconfig/ntpd look like? 
Does it help to recompile and use the ntp from 7.2 on 8.0?

Comment 2 Joseph Shraibman 2003-02-04 20:11:19 UTC

No, I'm not using -x. I'm using the standard startup script installed by the
rpm. I don't see how -x would change anything, that specifies slewing vs.
stepping, and I can't get it to sync in the first place.  I haven't tried
building old versions, when I do I'll get back to you.

Comment 3 Harald Hoyer 2003-02-05 08:58:08 UTC

ok, because -x makes things worse in 8.0 :-(

Comment 4 Joseph Shraibman 2003-02-07 00:22:43 UTC

I tried versions of ntp from 7.3 and 7.2. The version from 7.3 was about the
same, and the version from 7.2 is worse:

ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*LOCAL(0)        LOCAL(0)        10 l   54   64   37    0.000    0.000   0.008
 otc2.psu.edu    gps1.tns.its.ps  2 u    1   64   77   18.226  1634.44 3445.05
 proxy.cc.vt.edu navobs1.gatech.  2 u   60   64   37   10.697  1677.09 2743.05
 p1.selectacast. otc2.psu.edu     3 u   65   64   37    0.173  4380.14 2664.43

Comment 5 Harald Hoyer 2003-02-07 09:57:42 UTC

err, as you can see, ntpd choose your LOCAL clock to trust.
Why don't you put a reliable server ip in /etc/ntp/step-tickers and restart ntpd?
# service ntpd restart
Also p1.selectacast. seems to be way out of sync... remove it...

Comment 6 Harald Hoyer 2003-02-07 09:59:10 UTC

also note, that ntpd chooses the prefered server after 3 minutes... so you have
to wait after a restart, until s.th. happens

Comment 7 Joseph Shraibman 2003-02-07 14:45:58 UTC

>err, as you can see, ntpd choose your LOCAL clock to trust.

Exactly, thats what the bug is. tc2.psu.edu   proxy.cc.vt.edu are reliable ntp
servers.

And p1 is not out of sync. It just shows up that way in that snapshot. It might
be a symptom of the problem that makes ntpd think the jitter is so high.

Here is a more recent series of snapshots. p1, which has a very low delay, has
the same jitter and offset of the stratum 2 servers:

ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        LOCAL(0)        10 l   41   64    1    0.000    0.000   0.008
 otc2.psu.edu    ntp2.usno.navy.  2 u   51   64    1   18.401  272.071   0.008
 proxy.cc.vt.edu gps1.tns.its.ps  2 u   51   64    1   12.214  285.203   0.008
 p1.selectacast. otc2.psu.edu     3 u   52   64    1    0.161  250.357   0.008
ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 LOCAL(0)        LOCAL(0)        10 l   24   64    3    0.000    0.000   0.008
 otc2.psu.edu    ntp2.usno.navy.  2 u   37   64    3   17.980  1633.11 1361.04
 proxy.cc.vt.edu gps1.tns.its.ps  2 u   36   64    3   12.063  1668.35 1383.15
 p1.selectacast. otc2.psu.edu     3 u   36   64    3    0.153  1656.76 1406.40
ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*LOCAL(0)        LOCAL(0)        10 l   10   64   77    0.000    0.000   0.008
 otc2.psu.edu    ntp2.usno.navy.  2 u   22   64   77   18.875  7194.44 1402.61
 proxy.cc.vt.edu gps1.tns.its.ps  2 u   18   64   77   12.240  7290.64 1355.58
 p1.selectacast. otc2.psu.edu     3 u   20   64   77    0.214  7239.38 1422.10

Comment 8 Harald Hoyer 2003-02-07 15:05:07 UTC

hmm... delay to the stratum 2 servers is very high... what happens if you remove
LOCAL?

Comment 9 Joseph Shraibman 2003-02-07 17:32:56 UTC

18 is high?

Here is output after I restarted ntp without local.  Notice how after reach 1
the jitter is low, but builds over time.

ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 otc2.psu.edu    otc1.psu.edu     2 u    4   64    1   18.346  343.673   0.008
 proxy.cc.vt.edu tick.usno.navy.  2 u    1   64    1   12.266  421.555   0.008
 p1.selectacast. otc2.psu.edu     3 u    9   64    1    0.167  237.928   0.008
ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 otc2.psu.edu    otc1.psu.edu     2 u   31   64    3   18.812  1719.21 1375.54
 proxy.cc.vt.edu tick.usno.navy.  2 u   25   64    3   10.202  1860.50 1438.95
 p1.selectacast. otc2.psu.edu     3 u   33   64    3    0.200  1675.74 1437.81
ntpq> peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 otc2.psu.edu    otc1.psu.edu     2 u   54   64    7   18.221  3070.49 1351.28
 proxy.cc.vt.edu tick.usno.navy.  2 u   45   64    7   11.533  3279.85 1419.34
 p1.selectacast. otc2.psu.edu     3 u   55   64    7    0.191  3050.96 1375.21

Comment 10 Harald Hoyer 2003-03-10 15:18:39 UTC

wow, this looks really bad.. can you retry with the latest rawhide version?

Comment 11 Joseph Shraibman 2003-03-11 21:21:15 UTC

I tried ntp-4.1.2-0.rc1.2.i386.rpm from rawhide, same problem.  If the problem
is in the kernel then changing ntp won't help much.  I have a cron script that
sets the time via ntp every 5 minutes, and it sets the time forward 6.2 to 6.4
seconds every time, so that's how bad the hardware clock is.  The question is
how come ntp thinks the jitter is so high?

Comment 12 Harald Hoyer 2003-03-12 11:02:45 UTC

because of your bad hw clock?

Comment 13 Joseph Shraibman 2003-03-12 20:25:56 UTC

That's what ntp is for: compensating for bad hardware clocks.  I had the same
hardware clock in 7.2 but ntp still worked.

Comment 14 Joseph Shraibman 2003-04-04 00:04:04 UTC

I visted the machine in the datacenter, and there were a lot of messages like
this on the screen:
set_rtc_mmss: can't update from 8 to 56
The messages didn't appear in the logs anywhere.  After rebooting the machine,
ntp worked for 8 1/2 hours and then:
ntpd[740]: synchronisation lost

The peers command shows the same jitter again.

Comment 15 Howard Holm 2003-04-26 02:00:34 UTC

I've seen this kind of problem on 7.3, 8.0 and 9 systems where a gnome-battery
applet was running in the panel.  Is this machine a laptop by chance?

Comment 16 Joseph Shraibman 2003-04-27 04:40:57 UTC

I don't own any dual athlon laptops, do you?

Comment 17 Harald Hoyer 2003-10-08 12:01:39 UTC

please retry with the latest rawhide version

Comment 18 Joseph Shraibman 2003-11-04 19:30:42 UTC

I've since upgraded this machine to redhat 9 and it doesn't have the
problem anymore.  Do you still want me to try the rawhide version?

Comment 19 Harald Hoyer 2003-11-05 07:54:28 UTC

no, if it works in 9, proves my patches are good and the version in 9
is all right. Thank you!

Note You need to log in before you can comment on or make changes to this bug.