Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1208772

Summary:	better synchronize time between hosts
Product:	[oVirt] otopi	Reporter:	Michal Skrivanek <michal.skrivanek>
Component:	Plugins.system	Assignee:	Alon Bar-Lev <alonbl>
Status:	CLOSED DUPLICATE	QA Contact:	Pavel Stehlik <pstehlik>
Severity:	unspecified	Docs Contact:
Priority:	high
Version:	---	CC:	bazulay, dougsland, ecohen, gklein, iheim, lsurette, michal.skrivanek, rbalakri, Rhev-m-bugs, shavivi, sherold, yeylon
Target Milestone:	---	Keywords:	FutureFeature
Target Release:	---	Flags:	michal.skrivanek: devel_ack?
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	infra
Fixed In Version:		Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-06-23 14:56:32 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Infra	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	970711

Description Michal Skrivanek 2015-04-03 07:05:14 UTC

In RFE bug 970711 we are to report the migration downtime, which libvirt reports to vdsm, but it does not take into account clock differences on src and dst. 
It is done in a simple way using a src clock value, on dst host subtract from local time and that's how long it took (this is a part of the total reported number).
We depend on a precise time synchronization between hosts

For most migrations the downtime values are in order of tens of milliseconds, so we need to make sure that the src and dst host clock are synchronized at least with that precision. Otherwise the reported value is biased beyond being meaningful
Customers do use the parameter for realtime workloads where they have a hard requirement on the allowed downtime, e.g. 100ms, which we are supposed to set as a migration convergence criteria

There might be other ways how to make sure the time is in sync but 
- in <3.2 we used to enable ntpd (depending on admin's proper setup)
- current code monitors for hosts time differences, but only in a naive way, with seconds precision, and default of 300s to alert

We need better time sync reporting and help with setting it up correctly or manage it ourselves. E.g. if DHCP provides NTP server addresses we should start ntpd

Comment 2 Sandro Bonazzola 2015-04-03 07:12:58 UTC

Just to be sure, since this bug has been assigned to me but the whiteboard says infra, the reques here is to add ntp service as ovirt-engine-setup dependency and have engine-setup configuring and starting ntpd?

Because looking at description, this doesn't seem to be an installer issue since the ntpd daemon should be running on the hosts, so ovirt-host-deploy looks like a better component for this issue.

Comment 3 Alon Bar-Lev 2015-04-03 10:22:09 UTC

as we discussed in irc, host-deploy cannot just enable ntpd service and assume that ntpd is functioning.

yes, there is a chance that a valid ntpd configuration is available via the dhcp, but this is not assumption we should make by sysadmin and not us. if we require it as a feature of vdsm we must be sure that this is functioning.

we should also not assume which ntp service is to be used by sysadmin to sync clocks, there are multiple choices out there.

these are all minors and important technicals.

the more important issue is that host-deploy is just automation of vdsm setup, nothing more.

if you added a feature of *vdsm* that *requires* clock synchronization, then vdsm should take care of ntpd management, such as it takes care of iscsi or any other dependency.

this will make sure that even if sysadmin stopped the ntpd post host-deploy or removed it from start at boot list, starting vdsm will trigger ntpd start.

I would also suggest that every timestamp that is sent that requires clock synchronization will also send a boolean if clock is indeed synchronized or not, so manager can consider only these that actually synchronized.

I my-self would have tried very hard to implement this feature without any need for clock synchronization by calculating the downtime differences and not absolute times.

I am not sure what data is available for you, but each host should know the time when process starts and when the time its process ends, so it could calculate its delta, and the delta between its-self and the remote host, even if clocks are not synchronized.

Comment 4 Michal Skrivanek 2015-04-08 09:05:02 UTC

note this can be solved by implementing bug 1162588

Comment 5 Alon Bar-Lev 2015-06-23 14:56:32 UTC


*** This bug has been marked as a duplicate of bug 1162588 ***