1150087 – Log entries should explain why HE agent try to start vms on both hosts

Bug 1150087 - Log entries should explain why HE agent try to start vms on both hosts

Summary: Log entries should explain why HE agent try to start vms on both hosts

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-hosted-engine-ha
Sub Component:
Version:	3.5.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	ovirt-3.6.0-rc
Target Release:	3.6.0
Assignee:	Roman Mohr
QA Contact:	Artyom
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1164572 1207634 (view as bug list)
Depends On:	1208458 1213307 1213878 1215623 1215663 1215967 1220824 1227466 1271272
Blocks:	1220119 1234906
TreeView+	depends on / blocked

Reported:	2014-10-07 12:15 UTC by Artyom
Modified:	2016-03-09 19:48 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	In self-hosted engine, when the Manager virtual machine crashes, it will be restarted on another host. However, since all hosts would attempt to restart the Manager virtual machine but only one can succeed, errors returned on the hosts that failed to restart the Manager virtual machine. Though the error is dismissed when the synchronization is complete, this could be confusing to users. This update adds a proper log message to the hosts that failed to restart the Manager virtual machine and errors are no longer returned.
Clone Of:
Clones:	1220119 (view as bug list)
Environment:
Last Closed:	2016-03-09 19:48:42 UTC
oVirt Team:	SLA
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
logs (3.15 MB, application/zip) 2014-10-07 12:15 UTC, Artyom	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2016:0422	normal	SHIPPED_LIVE	ovirt-hosted-engine-ha bug fix and enhancement update	2016-03-09 23:58:25 UTC
oVirt gerrit	34204	master	MERGED	be more verbose when starting the HE VM fails	2020-05-30 23:01:16 UTC
oVirt gerrit	40777	ovirt-hosted-engine-ha-1.2	MERGED	be more verbose when starting the HE VM fails	2020-05-30 23:01:16 UTC
oVirt gerrit	45592	master	MERGED	states, broker: Check for already starting VMs on other hosts	2020-05-30 23:01:16 UTC

Description Artyom 2014-10-07 12:15:10 UTC

Created attachment 944564 [details]
logs

Description of problem:
I have HE environment with two hosts, after that I put environment to global maintenance, I kill vm, and disable global maintenance. I see that vm started on one of hosts and second host receive state state=EngineUnexpectedlyDown.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-1.2.2-2.el6ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Setup HE environment with two hosts
2. Put environment to global maintenance and kill vm
3. Disable global maintenance

Actual results:
--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : 10.35.97.36
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 2400
Local maintenance                  : False
Host timestamp                     : 103047
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=103047 (Tue Oct  7 15:10:18 2014)
        host-id=1
        score=2400
        maintenance=False
        state=EngineUp


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : 10.35.64.85
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0
Local maintenance                  : False
Host timestamp                     : 102939
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=102939 (Tue Oct  7 15:12:25 2014)
        host-id=2
        score=0
        maintenance=False
        state=EngineUnexpectedlyDown
        timeout=Fri Jan  2 06:42:26 1970
Vm started, but second host receive state state=EngineUnexpectedlyDown

Expected results:
Vm started and both hosts have status state=EngineUp

Additional info:

Comment 1 Jiri Moskovcak 2014-10-08 09:27:20 UTC

This is expected behavior. The agent should sync after some time. Also please note, that only the agent on host which is running the engine vm will have state=EngineUp, the other one will have state=EngineDown.

Comment 2 Jiri Moskovcak 2014-10-14 10:49:06 UTC

We should at least improve the log message to not confuse users.

Comment 4 Eyal Edri 2015-02-25 08:45:29 UTC

3.5.1 is already full with bugs (over 80), and since none of these bugs were added as urgent for 3.5.1 release in the tracker bug, moving to 3.5.2

Comment 6 Artyom 2015-09-01 12:16:47 UTC

Checked on ovirt-hosted-engine-ha-1.3.0-0.3.beta.git183a4ff.el7ev.noarch

Run scenario above, but do not see any explanation logger.
Just want to clarify, from code, flow will never enter to except part, because action start VM via vds

Comment 7 Martin Sivák 2015-09-02 08:12:40 UTC

There are two pieces of the fix. The start command might fail (the message there is already fixed) and two agents might try to start the VM twice.

The second case is a duplicate of 1207634, but I will close the other one to keep the flags.

Comment 8 Martin Sivák 2015-09-02 08:16:17 UTC

*** Bug 1207634 has been marked as a duplicate of this bug. ***

Comment 9 Martin Sivák 2015-09-02 09:58:40 UTC

*** Bug 1164572 has been marked as a duplicate of this bug. ***

Comment 11 Artyom 2015-09-16 13:12:48 UTC

Verified on ovirt-hosted-engine-ha-1.3.0-0.5.beta.git9a2bd43.el7ev.noarch
MainThread::INFO::2015-09-16 15:59:20,360::states::746::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Another host already took over..
MainThread::INFO::2015-09-16 15:59:20,368::state_decorators::88::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Timeout cleared while transitioning <class 'ovirt_hosted_engine_ha.agent.states.EngineStarting'> -> <class 'ovirt_hosted_engine_ha.agent.states.EngineForceStop'>

Comment 13 errata-xmlrpc 2016-03-09 19:48:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0422.html

Note You need to log in before you can comment on or make changes to this bug.