Bug 1208772 - better synchronize time between hosts
Summary: better synchronize time between hosts
Keywords:
Status: CLOSED DUPLICATE of bug 1162588
Alias: None
Product: otopi
Classification: oVirt
Component: Plugins.system
Version: ---
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: ---
Assignee: Alon Bar-Lev
QA Contact: Pavel Stehlik
URL:
Whiteboard: infra
Depends On:
Blocks: 970711
TreeView+ depends on / blocked
 
Reported: 2015-04-03 07:05 UTC by Michal Skrivanek
Modified: 2016-02-10 19:10 UTC (History)
12 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-06-23 14:56:32 UTC
oVirt Team: Infra
Embargoed:
michal.skrivanek: devel_ack?


Attachments (Terms of Use)

Description Michal Skrivanek 2015-04-03 07:05:14 UTC
In RFE bug 970711 we are to report the migration downtime, which libvirt reports to vdsm, but it does not take into account clock differences on src and dst. 
It is done in a simple way using a src clock value, on dst host subtract from local time and that's how long it took (this is a part of the total reported number).
We depend on a precise time synchronization between hosts

For most migrations the downtime values are in order of tens of milliseconds, so we need to make sure that the src and dst host clock are synchronized at least with that precision. Otherwise the reported value is biased beyond being meaningful
Customers do use the parameter for realtime workloads where they have a hard requirement on the allowed downtime, e.g. 100ms, which we are supposed to set as a migration convergence criteria

There might be other ways how to make sure the time is in sync but 
- in <3.2 we used to enable ntpd (depending on admin's proper setup)
- current code monitors for hosts time differences, but only in a naive way, with seconds precision, and default of 300s to alert

We need better time sync reporting and help with setting it up correctly or manage it ourselves. E.g. if DHCP provides NTP server addresses we should start ntpd

Comment 2 Sandro Bonazzola 2015-04-03 07:12:58 UTC
Just to be sure, since this bug has been assigned to me but the whiteboard says infra, the reques here is to add ntp service as ovirt-engine-setup dependency and have engine-setup configuring and starting ntpd?

Because looking at description, this doesn't seem to be an installer issue since the ntpd daemon should be running on the hosts, so ovirt-host-deploy looks like a better component for this issue.

Comment 3 Alon Bar-Lev 2015-04-03 10:22:09 UTC
as we discussed in irc, host-deploy cannot just enable ntpd service and assume that ntpd is functioning.

yes, there is a chance that a valid ntpd configuration is available via the dhcp, but this is not assumption we should make by sysadmin and not us. if we require it as a feature of vdsm we must be sure that this is functioning.

we should also not assume which ntp service is to be used by sysadmin to sync clocks, there are multiple choices out there.

these are all minors and important technicals.

the more important issue is that host-deploy is just automation of vdsm setup, nothing more.

if you added a feature of *vdsm* that *requires* clock synchronization, then vdsm should take care of ntpd management, such as it takes care of iscsi or any other dependency.

this will make sure that even if sysadmin stopped the ntpd post host-deploy or removed it from start at boot list, starting vdsm will trigger ntpd start.

I would also suggest that every timestamp that is sent that requires clock synchronization will also send a boolean if clock is indeed synchronized or not, so manager can consider only these that actually synchronized.

I my-self would have tried very hard to implement this feature without any need for clock synchronization by calculating the downtime differences and not absolute times.

I am not sure what data is available for you, but each host should know the time when process starts and when the time its process ends, so it could calculate its delta, and the delta between its-self and the remote host, even if clocks are not synchronized.

Comment 4 Michal Skrivanek 2015-04-08 09:05:02 UTC
note this can be solved by implementing bug 1162588

Comment 5 Alon Bar-Lev 2015-06-23 14:56:32 UTC

*** This bug has been marked as a duplicate of bug 1162588 ***


Note You need to log in before you can comment on or make changes to this bug.