Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1313922

Summary:	If moving engine vm due to big score diff, should migrate instead of shutdown/startup
Product:	[oVirt] ovirt-hosted-engine-ha	Reporter:	Yedidyah Bar David <didi>
Component:	RFEs	Assignee:	Martin Sivák <msivak>
Status:	CLOSED WONTFIX	QA Contact:	Ilanit Stein <istein>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	2.0.0	CC:	bugs, didi, msivak, stirabos
Target Milestone:	---	Flags:	rule-engine: planning_ack? rule-engine: devel_ack? rule-engine: testing_ack?
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-03-02 15:52:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Infra	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Yedidyah Bar David 2016-03-02 15:43:06 UTC

Description of problem:

$subject. I see this in agent.log:

MainThread::ERROR::2016-02-23 15:12:17,511::states::385::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Host didi-box1.home.local (id 3) score is significantly better than local score, shutting down VM on this host

Briefly looked at the code, can't see that we try to migrate.

Comment 1 Martin Sivák 2016-03-02 15:52:45 UTC

This behavior is intentional. Big score difference might mean that the network is slow or the host is too loaded and it would take too long time to safely migrate the VM. So we shut it down and restart it on different node.

Comment 2 Yedidyah Bar David 2016-03-03 06:29:22 UTC

Thanks for the explanation.

Is there an official way to prevent that?

In particular, when upgrading the first host in a 3.5 cluster to 3.6, this will always happen, as the new host will have score 3400 and the 3.5 ones 2400. I'd find it pretty harsh for a vm do go down if the only reason is that a host was upgraded...

Comment 3 Martin Sivák 2016-03-03 10:49:29 UTC

Sure, put the cluster into global maintenance. That will disable all automation during the upgrade.