Bug 216351

Summary: ntpd starts "too early"
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal>
Component: ntpAssignee: Miroslav Lichvar <mlichvar>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 6CC: jbarnes
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-02-06 16:34:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
modifies startup file for ntpd none

Description Michal Jaegermann 2006-11-19 20:10:35 UTC
Description of problem:

There is a problem when network connections are handles by
NetworkManager.  Namely while a machine is really networked
there may be no connection yet when /etc/init.d/ntpd runs
and we have a long delay trying to do something which cannot
succeed.  Even worse, we have a delay, quite substantial one
in times, in a startup sequence always waiting for clock-tickers
and a response from time servers.

This is even bigger issue when the only connections which can
be active in the given situation are wireless. With NetworkManager
that means that somebody has to login before an interface to
an "outside world" will be up.

Here is a proposed solution.  Add in /etc/sysconfig/ntpd
the following:

# Wait that many minutes for a network interface to show up.
NDELAY=15
# If NDELAY not given or "" then start immediately and synchronously;
# otherwise do that in a background and success and failure messages
# are suppressed.
# If NDELAY is 0 then we wait indefinitely.

# How often to try, in seconds, if NDELAY is not "".
NGAP=60

# A list of interfaces through which an ntpd server can be reached.
# Used only when NDELAY is non-empty.
# If not given then all non-loopback interfaces will be tried.
# NLIST="eth0 wlan0 wifi0"

and modify /etc/init.d/ntpd as in an attached version.  With
NDELAY="" this works like now for those who really need ntpd up
(not guaranteed in any case) before proceeding with the rest of
a startup sequence.  Otherwise this happens in a background after
some "candidate" network interfaces are detected in an UP state.

A list of interfaces to check is important for machines with
multiple connections when only on some of those we have a chance
that an ntp server could be reached.

The same issue affects ntp startup also in FC5 and RHEL.

Comment 1 Michal Jaegermann 2006-11-19 20:10:35 UTC
Created attachment 141594 [details]
modifies startup file for ntpd

Comment 2 Miroslav Lichvar 2006-11-20 14:17:44 UTC
This is a known problem, reported as bug #146884 and bug #206127. It will be
fixed when new version of ntp is released (ntp-4.2.4) as it will be able to
handle dynamic interfaces.

Comment 3 Miroslav Lichvar 2006-11-28 12:34:39 UTC
*** Bug 217423 has been marked as a duplicate of this bug. ***

Comment 4 Michal Jaegermann 2006-12-07 23:55:47 UTC
Now it looks like that ntp-4.2.2p4-2.fc6 **really** broke ntpd if
your network connection may show up later (because, for example, it
is handled by NetworkManager).

True, in a startup sequence one sees "OK" from ntdp immediately even
if a network is down.  This gives .LOCL. clock only, which does not
buy very much so far.  After a network is getting active the other
servers are getting reported by 'ntpq -pn' and it appears that we
are on our way but this is really an illusion.

When starting with a network present those other servers are
stratum, say, 2 or 3 and 127.127.1.0 is 10 so we are really syncronizing.
If network shows up later then local clock is still stratum 10 but
all other servers are stratum 16 and they stay that way.  "reach",
"delay", "offset" and "jitter" all are fixed at 0 so these other servers
are really as good as dead.  I was waiting for over 35 minutes, which
is ridiculously long, and nothing changed.

It also appears that an initial 'ntpdate' sync is lost so if your clock
is outside drift limits it will never get synchronized.

For now a possible workaround seems to be to leave some process which
will check if we are really getting some servers with a stratum lower
than 10 and repeatedly restart ntpd if this is not the case.  Sigh!

Restarting ntpd with a network active immediately shows outside servers
which are really consulted for time.

Comment 5 Miroslav Lichvar 2006-12-08 09:40:23 UTC
It was always like this.

A better workaround would be restarting ntp daemon in a script executed from
NetworkManagerDispatcher.

Comment 6 Michal Jaegermann 2006-12-08 18:01:08 UTC
I definitely agree with a "better workaround".  Right now it is
better not to start ntpd at all.  In the situation in question ntpd
is really only chewing cycles without contributing anything useful.
The problem is that there are other issues of that sort.  See
bug 218237 for further examples.

Do I miss some obstacle preventing an earlier startup of NetworkManager
(and Dispatcher)?

Comment 7 Michal Jaegermann 2006-12-10 18:12:03 UTC
What I proposed in an attachment to comment #1 still works fine
with ntp-4.2.2p4-2.fc6.  Only a closer look reveals that a network
interface can be marked as UP with no address assigned.
NetworkManager at work? Hence 'connection_up' function in that
attachment should be modified as follows:

connection_up () {
    if [ -z "$1" ] ; then
        ip -o addr | grep -wv lo | grep -qw 'inet6*'
    else
        for iface in $@ ; do
            ip -o addr show dev $iface 2>/dev/null \
                | grep -qw 'inet6*' && return
        done
    fi
}

This, obviously, does not give guarantees that at least some time
servers will be reachable but still much better than the current
situation.

Comment 8 Miroslav Lichvar 2007-01-08 13:05:33 UTC
ntp-4.2.4 is finally in rawhide. Please give it a try, if everything is ok with
NetworkManager. I will make an update for FC6 in a week or so.

Comment 9 Michal Jaegermann 2007-02-06 20:35:59 UTC
Hm, I still have some doubts about "fixed" in ntp-4.2.4-3.fc6 (the
current updates).

If external step-tickers are not reachable when /etc/init.d/ntpd
runs, which is a normal situation when NetworkManager is in use,
then synchronization happens on .LOCAL. clock, ntpd starts without
blocking there for a long time and so far so good.  Also, when after
some time, external servers become reachable then I see that indeed
at some moment they are promoted to a higher strata and ntpd synchronizes
there.  Even better.  Thanks!

The problem is that if a difference between network time and a local
clock is big enough, which is not that unusual, then after an initial
synchronization on a local clock ntpd will give up, as designed, and
we will be left with a wrong time and a manual intervention the only
remaining option.  Do I miss something?

Comment 10 Miroslav Lichvar 2007-02-07 10:24:23 UTC
Try removing everything from the step-ticker file, so ntpd will be started with
-g option and removing local clock from ntp.conf.

Comment 11 Michal Jaegermann 2007-02-07 19:06:03 UTC
Yes, I see.  Dropping /etc/ntp/step-tickers file and
"Undisciplined Local Clock" driver in /etc/ntp indeed should work
by forcing a use of '-g' and currently all this gets relatively
shortly into a desired state once ntp servers can be reached.
I would be not that clear to me that I have to change that way my
old configuration without what you wrote in comment #10.  Thanks again!