Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1036683

Summary:

Time shift on host may cause ha to stop responding

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Greg Padgett <gpadgett>

Component:

ovirt-hosted-engine-ha

Assignee:

Doron Fediuck <dfediuck>

Status:

CLOSED ERRATA

QA Contact:

Artyom <alukiano>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

3.3.0

CC:

amureini, dfediuck, eedri, gpadgett, iheim, mavital, msivak, oschreib, sbonazzo, sherold

Target Milestone:

---

Target Release:

3.5.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

sla

Fixed In Version:

ovirt-hosted-engine-ha-1.2.2-2.el6ev

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-02-11 21:08:22 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

SLA

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1109545, 1123285

Bug Blocks:

Attachments:

Description	Flags
logs	none

Description Greg Padgett 2013-12-02 13:22:18 UTC

Description of problem:
If a machine's time shifts (not timezone, but utc time setting), it affects timers in the HA agent and broker.

Shifting backwards will lengthen any timeouts in effect, and forwards will cause these timeouts to prematurely expire.  This could, for example, cause the agent to stay down for several hours or more, as the broker checks engine health based on a timer and may not perform the check if the time if moved backwards.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Start agent, broker
2. Adjust time backwards on host where engine VM is running
3. Shut down the engine VM

Actual results:
The VM stays down for the duration of the time shift

Expected results:
The HA system should restart the VM

Additional info:

Comment 3 Eyal Edri 2014-02-10 09:49:11 UTC

moving to 3.3.2 since 3.3.1 was built and moved to QE.

Comment 4 Doron Fediuck 2014-03-17 13:17:33 UTC

Pushing to 3.4, as it's too big for z-stream.

Comment 5 Jiri Moskovcak 2014-03-19 09:02:20 UTC

I have two ideas how to fix this:

1. just detect the clock shift and show a warning, because changing the system time back on the production machine is very unwise and could lead to many problems

2. use a monotonic timer to avoid dependency on the system time

Comment 6 Jiri Moskovcak 2014-03-26 13:37:10 UTC

(In reply to Jiri Moskovcak from comment #5)
> I have two ideas how to fix this:
> 
> 1. just detect the clock shift and show a warning, because changing the
> system time back on the production machine is very unwise and could lead to
> many problems
> 
> 2. use a monotonic timer to avoid dependency on the system time

- the decision is to use the monotonic time

Comment 7 Jiri Moskovcak 2014-05-05 08:57:02 UTC

It's too many changes too close to deadline -> moving to 3.5

Comment 8 Artyom 2014-07-27 07:55:18 UTC

Checked on ovirt-hosted-engine-ha-1.2.1-0.2.master.20140724142825.el6.noarch
1) hwclock --show
Sun 27 Jul 2014 10:39:38 AM IDT  -0.782173 seconds
2) #date --set="27 JUL 2014 10:00:00"
Sun Jul 27 10:00:00 IDT 2014
# hwclock -w
# hwclock --show --utc
Sun 27 Jul 2014 10:02:31 AM IDT  -0.734569 seconds
3) #hosted-engine --vm-poweroff
4) #vdsClient -s 0 list table return nothing
5) But, # hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : 10.35.97.36
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 172236
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=172236 (Sun Jul 27 10:10:31 2014)
        host-id=1
        score=2400
        maintenance=False
        state=EngineUp

So I see that status not updated and also agent to try to start vm

Comment 9 Artyom 2014-07-27 07:55:43 UTC

Created attachment 921369 [details]
logs

Comment 10 Martin Sivák 2014-10-03 10:28:19 UTC

> timestamp=172236 (Sun Jul 27 10:10:31 2014)

This means the status was updated in agent, as you went from 10:40 to about 10:00

Try that multiple times before and after the clock is set, the timestamp should only increment and the human readable version should correspond to the local time.

> and also agent to try to start vm

I do not see that in the log, but if you stopped the VM then the agent will try to start it again. That is the correct behaviour.


There is indeed something wrong with the broker, the monitoring threads freezed after the time shift and that caused agent to get stalled data.

Comment 12 Artyom 2014-10-07 12:44:09 UTC

Verified on ovirt-hosted-engine-ha-1.2.2-2.el6ev

Comment 15 errata-xmlrpc 2015-02-11 21:08:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0194.html