Hide Forgot
Description of problem: If a machine's time shifts (not timezone, but utc time setting), it affects timers in the HA agent and broker. Shifting backwards will lengthen any timeouts in effect, and forwards will cause these timeouts to prematurely expire. This could, for example, cause the agent to stay down for several hours or more, as the broker checks engine health based on a timer and may not perform the check if the time if moved backwards. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Start agent, broker 2. Adjust time backwards on host where engine VM is running 3. Shut down the engine VM Actual results: The VM stays down for the duration of the time shift Expected results: The HA system should restart the VM Additional info:
moving to 3.3.2 since 3.3.1 was built and moved to QE.
Pushing to 3.4, as it's too big for z-stream.
I have two ideas how to fix this: 1. just detect the clock shift and show a warning, because changing the system time back on the production machine is very unwise and could lead to many problems 2. use a monotonic timer to avoid dependency on the system time
(In reply to Jiri Moskovcak from comment #5) > I have two ideas how to fix this: > > 1. just detect the clock shift and show a warning, because changing the > system time back on the production machine is very unwise and could lead to > many problems > > 2. use a monotonic timer to avoid dependency on the system time - the decision is to use the monotonic time
It's too many changes too close to deadline -> moving to 3.5
Checked on ovirt-hosted-engine-ha-1.2.1-0.2.master.20140724142825.el6.noarch 1) hwclock --show Sun 27 Jul 2014 10:39:38 AM IDT -0.782173 seconds 2) #date --set="27 JUL 2014 10:00:00" Sun Jul 27 10:00:00 IDT 2014 # hwclock -w # hwclock --show --utc Sun 27 Jul 2014 10:02:31 AM IDT -0.734569 seconds 3) #hosted-engine --vm-poweroff 4) #vdsClient -s 0 list table return nothing 5) But, # hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : 10.35.97.36 Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 2400 Local maintenance : False Host timestamp : 172236 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=172236 (Sun Jul 27 10:10:31 2014) host-id=1 score=2400 maintenance=False state=EngineUp So I see that status not updated and also agent to try to start vm
Created attachment 921369 [details] logs
> timestamp=172236 (Sun Jul 27 10:10:31 2014) This means the status was updated in agent, as you went from 10:40 to about 10:00 Try that multiple times before and after the clock is set, the timestamp should only increment and the human readable version should correspond to the local time. > and also agent to try to start vm I do not see that in the log, but if you stopped the VM then the agent will try to start it again. That is the correct behaviour. There is indeed something wrong with the broker, the monitoring threads freezed after the time shift and that caused agent to get stalled data.
Verified on ovirt-hosted-engine-ha-1.2.2-2.el6ev
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0194.html