Bug 1313922

Summary: If moving engine vm due to big score diff, should migrate instead of shutdown/startup
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Yedidyah Bar David <didi>
Component: RFEsAssignee: Martin Sivák <msivak>
Status: CLOSED WONTFIX QA Contact: Ilanit Stein <istein>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.0.0CC: bugs, didi, msivak, stirabos
Target Milestone: ---Flags: rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-02 15:52:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yedidyah Bar David 2016-03-02 15:43:06 UTC
Description of problem:

$subject. I see this in agent.log:

MainThread::ERROR::2016-02-23 15:12:17,511::states::385::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Host didi-box1.home.local (id 3) score is significantly better than local score, shutting down VM on this host

Briefly looked at the code, can't see that we try to migrate.

Comment 1 Martin Sivák 2016-03-02 15:52:45 UTC
This behavior is intentional. Big score difference might mean that the network is slow or the host is too loaded and it would take too long time to safely migrate the VM. So we shut it down and restart it on different node.

Comment 2 Yedidyah Bar David 2016-03-03 06:29:22 UTC
Thanks for the explanation.

Is there an official way to prevent that?

In particular, when upgrading the first host in a 3.5 cluster to 3.6, this will always happen, as the new host will have score 3400 and the 3.5 ones 2400. I'd find it pretty harsh for a vm do go down if the only reason is that a host was upgraded...

Comment 3 Martin Sivák 2016-03-03 10:49:29 UTC
Sure, put the cluster into global maintenance. That will disable all automation during the upgrade.