1162588 – [RFE] configure NTP on hosts

Bug 1162588 - [RFE] configure NTP on hosts

Summary: [RFE] configure NTP on hosts

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	RFEs
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Scott Herold
QA Contact:	Gil Klein
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1171474 1208772 (view as bug list)
Depends On:
Blocks:	970711
TreeView+	depends on / blocked

Reported:	2014-11-11 11:13 UTC by David Jaša
Modified:	2019-04-28 13:57 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-06-07 17:51:58 UTC
oVirt Team:	Infra
Embargoed:
Dependent Products:
Flags:	ylavi: ovirt-future? rule-engine: planning_ack? rule-engine: devel_ack? rule-engine: testing_ack?

Attachments	(Terms of Use)

Description David Jaša 2014-11-11 11:13:35 UTC

Description of problem:
out-of-sync host clocks can cause lots of issues. It would be nice if RHEV could configure clocks on hosts. The design could follow these points:

1. all hosts within Cluster or Data Center would be configured as peers. This alone should ensure that the clocks are mutually synced

2. host install script and VDSM should check server availability in some environments, only internal NTP serves are allowed and OSs only try to sync time from default servers (*.pool.ntp.org), causing no NTP source for such servers

3. once some hosts in Cluster or Data Center are Up, RHEV GUI/API should allow listing of configured NTP servers for the cluster, addition of new servers (with validation of reachability) and removal of configured servers

4. on any host addition/deletion within Cluster or Data Center, the peers list should be updated

5. when host doesn't have any usable NTP servers configured, admin portal should warn about the condition in Host Overview and Alerts.

If this proposal is deemed useful to get implemented, there are some more design decisions to make:

* the peer group can be Data Center or Cluster. This would depend:
- if there are storage implications from out-of-sync hosts (favouring DC)
- maximum practical size of peer group (favouring clusters or even
smaller groups)

* additional NTP server could be set up on RHEV-M. IMO it wouldn't be terribly useful because firewalls only need to allow engine --> hosts connections (blocking NTP that would be in opposite directions) and engine is likely to run in the VM itself which harms NTP usefulness. ntpd on hosted engine seems outright bad because physical hosts hosting it are always better NTP source

* hosts could be allowed to serve NTP by default to any client. This would need verification of ntpd settings to make sure that NTP can't be used to DoS the thost

The implementation shouldn't be too hard because the host-side bits already exist:

* anaconda has some NTP module to extract servers from DHCP and validate their availability

* ntpd configuration allows inclusion of files so the server list, peer permissions and peers list can be kept in separate files and on settings update, vdsm would just write the new contents of these files and restart ntpd

Version-Release number of selected component (if applicable):
RHEV up to 3.5

Additional info:
* there seems to be only request to set up ntp server on RHEV-M: bug 482072

Q: isn't status quo good enough?
A: it isn't. Hosts are sometimes out-of-sync when hosts don't auto-add NTP servers from DHCP (or the network is configured statically) and network doesn't allow connection to default *.pool.ntp.org servers. Therefore the peering part of this RFE should get implemented

Q: what is the reason to include centralized configuration of NTP servers?
A: when peers auto-configuration and upstream NTP server unavailability is implemented, all the requirements for upstream servers configuration are in place so it should be rather small work to add the functionality.

Comment 2 Doron Fediuck 2014-12-08 07:14:09 UTC

*** Bug 1171474 has been marked as a duplicate of this bug. ***

Comment 3 Doron Fediuck 2014-12-08 07:21:43 UTC

See also Bug 619360 Comment 18.

One more thing to consider is hosted engine and NTP on the engine
machine. From past experience changing time on the engine machine
backwards or forward proved to be destructive for the quartz mechanism
engine is using.

As I see it this should be a prerequisite just like DNS and not something
we should be handling.

Comment 4 David Jaša 2014-12-08 12:32:44 UTC

> Duplicate of this bug: 1171474

Not realy. That bug requests NTP configuration of ovirt-engine, this bug requests configuration of hosts.

(In reply to Doron Fediuck from comment #3)
> See also Bug 619360 Comment 18.
> 

I already commented there back at the time. IMO the problem is still here so it probably needs _some_ attention.

> One more thing to consider is hosted engine and NTP on the engine
> machine. From past experience changing time on the engine machine
> backwards or forward proved to be destructive for the quartz mechanism
> engine is using.
> 

If possible, the right solution (TM) should be to:
  * peer the hosts in the cluster so that their clocks are consistent
  * use such host clock in hosted engine VM that will be correct even in
    migrate-to-file case
  * disable NTP in the VM
Last time I had to deal with paused VM (RHEL 6.5 VM @ RHEV 3.5), the second point didn't work out so probably something in qemu or VM configuration would deserve tuning in order to achieve it.

> As I see it this should be a prerequisite just like DNS and not something
> we should be handling.

you _could_ probably do at least checks without any service modification:

Easy check: check if ntp servers are reachable and hosts are synced against them

Sophisticated check: allow port 123 (from other hosts or globally) and do some queries to determine the mutual hosts clocks offsets. Difference larger than couple of seconds should result in warning, difference larger than a minute should be some error.

...

The root cause why the NTP configuration requests keep resurfacing is combo of:
1. pool.ntp.org servers aren't available everywhere
2. other servers are not configured for some reason, e.g.:
    * yet another regression of ntp-servers-from-dhcp-are-ignored bug
    * ntp forgotten by administrator on networks with static IP configuration

These two points do occur together at times so at least _some_ measure would be nice.

Comment 5 Alon Bar-Lev 2015-06-23 14:56:32 UTC

*** Bug 1208772 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.