Bug 1313922 - If moving engine vm due to big score diff, should migrate instead of shutdown/startup
If moving engine vm due to big score diff, should migrate instead of shutdown...
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: RFEs (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified (vote)
: ---
: ---
Assigned To: Martin Sivák
Ilanit Stein
Depends On:
  Show dependency treegraph
Reported: 2016-03-02 10:43 EST by Yedidyah Bar David
Modified: 2016-03-03 05:49 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2016-03-02 10:52:45 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?

Attachments (Terms of Use)

  None (edit)
Description Yedidyah Bar David 2016-03-02 10:43:06 EST
Description of problem:

$subject. I see this in agent.log:

MainThread::ERROR::2016-02-23 15:12:17,511::states::385::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Host didi-box1.home.local (id 3) score is significantly better than local score, shutting down VM on this host

Briefly looked at the code, can't see that we try to migrate.
Comment 1 Martin Sivák 2016-03-02 10:52:45 EST
This behavior is intentional. Big score difference might mean that the network is slow or the host is too loaded and it would take too long time to safely migrate the VM. So we shut it down and restart it on different node.
Comment 2 Yedidyah Bar David 2016-03-03 01:29:22 EST
Thanks for the explanation.

Is there an official way to prevent that?

In particular, when upgrading the first host in a 3.5 cluster to 3.6, this will always happen, as the new host will have score 3400 and the 3.5 ones 2400. I'd find it pretty harsh for a vm do go down if the only reason is that a host was upgraded...
Comment 3 Martin Sivák 2016-03-03 05:49:29 EST
Sure, put the cluster into global maintenance. That will disable all automation during the upgrade.

Note You need to log in before you can comment on or make changes to this bug.