Bug 1411319

Summary:

After successful migration of HE vm from host a to host b, immediately migrating back to host a fails with "Operation Canceled" from rhev manager portal.

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Ameya Charekar <achareka>

Component:

ovirt-hosted-engine-ha

Assignee:

Andrej Krejcir <akrejcir>

Status:

CLOSED ERRATA

QA Contact:

Nikolai Sednev <nsednev>

Severity:

low

Docs Contact:

Priority:

unspecified

Version:

4.0.5

CC:

achareka, akrejcir, bmcclain, dchaplyg, dfediuck, lsurette, mavital, melewis, mkalinin, msivak, nsednev, ykaul

Target Milestone:

ovirt-4.1.2

Keywords:

Triaged

Target Release:

---

Flags:

nsednev: testing_plan_complete+

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Previously, the results of a successful self-hosted engine migration were lost and the self-hosted engine agent reported that the migration failed. This was due to a race condition in VDSM and the self-hosted engine agent interface. Now, the absence of the virtual machine is treated as a sign of a successful migration. This means that the self-hosted engine agent correctly states 'EngineDown' on migration completion.

Story Points:

---

Clone Of:

Clones:

1436002 (view as bug list)

Environment:

Last Closed:

2017-05-25 06:46:28 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

SLA

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1264085, 1419326

Bug Blocks:

1436002

Attachments:

Description	Flags
sosreport-alma03.qa.lab.tlv.redhat.com-20170306174542.tar.xz	none
sosreport-alma04.qa.lab.tlv.redhat.com-20170306174536.tar.xz	none
sosreport-nsednev-he-1.qa.lab.tlv.redhat.com-20170306174523.tar.xz	none
screencast-2017-03-09_10.58.20.mkv	none
sosreport-puma18.scl.lab.tlv.redhat.com-20170309110039.tar.xz	none
sosreport-puma19.scl.lab.tlv.redhat.com-20170309110051.tar.xz	none
sosreport-nsednev-he-4.scl.lab.tlv.redhat.com-20170309110107.tar.xz	none

Description Ameya Charekar 2017-01-09 13:06:57 UTC

Description of problem:

After successful migration of HE vm from host a to host b, immediately migrating back to host a fails with "Operation Canceled" from rhev manager portal.

The host <hostname> did not satisfy internal filter HA because it is not a Hosted Engine host.

Version-Release number of selected component (if applicable):

rhevm-4.0.5.5-0.1.el7ev.noarch

How reproducible:

Always.

Steps to Reproduce:
1. Migrate HE vm from host a to host b.
2. After successful migration again immediately migrate to host a.
3.

Actual results:

immediate migration fails to host a.

Expected results:

Migration should be successful as we are manually migrating HE vm and there is no issue with HE host a.

Additional info:

After migration Host a's Score is 0 and hence engine does not allow migration with error message:

The host a did not satisfy internal filter HA because it is not a Hosted Engine host..

Comment 1 Martin Sivák 2017-01-09 13:23:44 UTC

Please attach the output of the hosted-engine --vm-status and check what the host information in webadmin says (hosted engine score for example).

Comment 2 Martin Sivák 2017-01-09 13:42:07 UTC

Ok, I see something weird in the logs. The state goes from EngineMigratingAway directly to ReinitializeFSM. That means the migration returned something unexpected. It goes downhill from there.. (EngineStarting, EngineUnexpectedlyDown when it finally migrates away)

Can you please apply https://gerrit.ovirt.org/#/c/67540/ (it adds better logging) and reproduce the issue? It is possible something changed slightly in the vdsm protocol for migrations.

Comment 16 Nikolai Sednev 2017-01-25 18:54:39 UTC

Works for me on these components on host:
rhvm-appliance-4.1.20170119.1-1.el7ev.noarch
ovirt-hosted-engine-ha-2.1.0-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0-2.el7ev.noarch
ovirt-host-deploy-1.6.0-1.el7ev.noarch
ovirt-imageio-common-0.5.0-0.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.3.x86_64
libvirt-client-2.0.0-10.el7_3.4.x86_64
mom-0.5.8-1.el7ev.noarch
vdsm-4.19.2-2.el7ev.x86_64
ovirt-setup-lib-1.1.0-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-daemon-0.5.0-0.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

On engine:
rhev-guest-tools-iso-4.1-3.el7ev.noarch
rhevm-doc-4.1.0-1.el7ev.noarch
rhevm-dependencies-4.1.0-1.el7ev.noarch
rhevm-setup-plugins-4.1.0-1.el7ev.noarch
rhevm-4.1.0.1-0.1.el7.noarch
rhevm-guest-agent-common-1.0.12-3.el7ev.noarch
rhevm-branding-rhev-4.1.0-0.el7ev.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Comment 17 Nikolai Sednev 2017-03-05 20:03:10 UTC

During migrations, from a to b, and then b to a, host a drops it's HA score to zero, without any reason.

After several migration attempts, got error message about failed migration:
"
Operation Canceled
Error while executing action:

HostedEngine:

    Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details:
    The host puma19.scl.lab.tlv.redhat.com did not satisfy internal filter HA because it is not a Hosted Engine host..
    The host puma18.scl.lab.tlv.redhat.com did not satisfy internal filter Memory because its available memory is too low (15167.000000 MB) to run the VM."

Components on hosts:
libvirt-client-2.0.0-10.el7_3.4.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64
rhevm-appliance-20160721.0-2.el7ev.noarch
mom-0.5.9-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0.3-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
vdsm-4.19.6-1.el7ev.x86_64
ovirt-host-deploy-1.6.0-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
ovirt-setup-lib-1.1.0-1.el7ev.noarch
ovirt-hosted-engine-ha-2.1.0.3-1.el7ev.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

On engine:
rhev-guest-tools-iso-4.1-3.el7ev.noarch
rhevm-dependencies-4.1.0-1.el7ev.noarch
rhevm-doc-4.1.0-2.el7ev.noarch
rhevm-branding-rhev-4.1.0-1.el7ev.noarch
rhevm-setup-plugins-4.1.0-1.el7ev.noarch
rhevm-4.1.1.2-0.1.el7.noarch
Linux version 3.10.0-514.6.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Feb 17 19:21:31 EST 2017
Linux 3.10.0-514.6.2.el7.x86_64 #1 SMP Fri Feb 17 19:21:31 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Comment 18 Martin Sivák 2017-03-06 15:26:39 UTC

(In reply to Nikolai Sednev from comment #17)
> During migrations, from a to b, and then b to a, host a drops it's HA score
> to zero, without any reason.

There must be a reason, will you attach the logs please?

Comment 19 Nikolai Sednev 2017-03-06 15:49:57 UTC

	
Mar 6, 2017 5:44:13 PM
	
	
Migration failed (VM: HostedEngine, Source: alma04.qa.lab.tlv.redhat.com).

Logs being attached.

Comment 20 Nikolai Sednev 2017-03-06 15:51:06 UTC

Created attachment 1260445 [details]
sosreport-alma03.qa.lab.tlv.redhat.com-20170306174542.tar.xz

Comment 21 Nikolai Sednev 2017-03-06 15:52:02 UTC

Created attachment 1260447 [details]
sosreport-alma04.qa.lab.tlv.redhat.com-20170306174536.tar.xz

Comment 22 Nikolai Sednev 2017-03-06 15:53:04 UTC

Created attachment 1260448 [details]
sosreport-nsednev-he-1.qa.lab.tlv.redhat.com-20170306174523.tar.xz

Comment 23 Martin Sivák 2017-03-06 16:10:49 UTC

You FailedQA report talks about pumaX hosts, the logs are from almaX hosts...

But anyway:

alma03 was the source and seems to be OK:

MainThread::INFO::2017-03-06 17:44:21,964 HostedEngine::(_monitor_migration) VM not found, assuming that migration is complete
MainThread::INFO::2017-03-06 17:44:21,966 HostedEngine::(consume) Migration to None complete, no longer monitoring vm
MainThread::INFO::2017-03-06 17:44:21,970 BrokerLink::(notify) Trying: notify time=1488815061.97 type=state_transition detail=EngineMigratingAway-EngineDown hostname='alma03.qa.lab.tlv.redhat.com'
MainThread::INFO::2017-03-06 17:44:22,073 BrokerLink::(notify) Success, was notification of state_transition (EngineMigratingAway-EngineDown) sent? sent

MainThread::INFO::2017-03-06 17:44:39,423 HostedEngine::(start_monitoring) Current state EngineDown (score: 3400)
MainThread::INFO::2017-03-06 17:44:39,423 HostedEngine::(start_monitoring) Best remote host alma04.qa.lab.tlv.redhat.com (id: 2, score: 3400)

alma04 received the VM and is running happily too:

MainThread::INFO::2017-03-06 17:44:05,071 HostedEngine::(consume) Engine vm is running on host alma03.qa.lab.tlv.redhat.com (id 1)
MainThread::INFO::2017-03-06 17:44:35,227 HostedEngine::(start_monitoring) Current state EngineStarting (score: 3400)
MainThread::INFO::2017-03-06 17:45:05,331 HostedEngine::(start_monitoring) Current state EngineUp (score: 3400)

The hosted engine logs do not correspond to what you are describing. I see the failure in the engine, but it would mean the time it not in sync (engine is about 20 sec behind nodes) and you were trying another migration from alma04 back to alma03. Well it did not allow you to do that because the monitoring still hasn't registered alma03 as being ready (with not enough memory reason):

2017-03-06 17:44:13,164+02 INFO  [org.ovirt.engine.core.bll.scheduling.SchedulingManager] (org.ovirt.thread.pool-6-thread-14) [4ebcedba-fa0f-47c5-9519-b4b48a5e78fb] Candidate host 'alma03.qa.lab.tlv.redhat.com' ('8c70eb98-b950-4d73-9782-38b1bdbc9afc') was filtered out by 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id: 4ebcedba-fa0f-47c5-9519-b4b48a5e78fb)
2017-03-06 17:44:13,209+02 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-14) [4ebcedba-fa0f-47c5-9519-b4b48a5e78fb] EVENT_ID: VM_MIGRATION_NO_VDS_TO_MIGRATE_TO(166), Correlation ID: 4ebcedba-fa0f-47c5-9519-b4b48a5e78fb, Job ID: dced6cbd-e253-4889-8be7-75b9bedc66ad, Call Stack: null, Custom Event ID: -1, Message: No available host was found to migrate VM HostedEngine to.


That is definitely by design and not a bug in this case. But please check where the logs are coming from, because you used different hostnames in comment #17.

Comment 24 Nikolai Sednev 2017-03-06 18:26:53 UTC

(In reply to Martin Sivák from comment #23)
> You FailedQA report talks about pumaX hosts, the logs are from almaX hosts...
> 
> But anyway:
> 
> alma03 was the source and seems to be OK:
> 
> MainThread::INFO::2017-03-06 17:44:21,964 HostedEngine::(_monitor_migration)
> VM not found, assuming that migration is complete
> MainThread::INFO::2017-03-06 17:44:21,966 HostedEngine::(consume) Migration
> to None complete, no longer monitoring vm
> MainThread::INFO::2017-03-06 17:44:21,970 BrokerLink::(notify) Trying:
> notify time=1488815061.97 type=state_transition
> detail=EngineMigratingAway-EngineDown hostname='alma03.qa.lab.tlv.redhat.com'
> MainThread::INFO::2017-03-06 17:44:22,073 BrokerLink::(notify) Success, was
> notification of state_transition (EngineMigratingAway-EngineDown) sent? sent
> 
> MainThread::INFO::2017-03-06 17:44:39,423 HostedEngine::(start_monitoring)
> Current state EngineDown (score: 3400)
> MainThread::INFO::2017-03-06 17:44:39,423 HostedEngine::(start_monitoring)
> Best remote host alma04.qa.lab.tlv.redhat.com (id: 2, score: 3400)
> 
> alma04 received the VM and is running happily too:
> 
> MainThread::INFO::2017-03-06 17:44:05,071 HostedEngine::(consume) Engine vm
> is running on host alma03.qa.lab.tlv.redhat.com (id 1)
> MainThread::INFO::2017-03-06 17:44:35,227 HostedEngine::(start_monitoring)
> Current state EngineStarting (score: 3400)
> MainThread::INFO::2017-03-06 17:45:05,331 HostedEngine::(start_monitoring)
> Current state EngineUp (score: 3400)
> 
> The hosted engine logs do not correspond to what you are describing. I see
> the failure in the engine, but it would mean the time it not in sync (engine
> is about 20 sec behind nodes) and you were trying another migration from
> alma04 back to alma03. Well it did not allow you to do that because the
> monitoring still hasn't registered alma03 as being ready (with not enough
> memory reason):
> 
> 2017-03-06 17:44:13,164+02 INFO 
> [org.ovirt.engine.core.bll.scheduling.SchedulingManager]
> (org.ovirt.thread.pool-6-thread-14) [4ebcedba-fa0f-47c5-9519-b4b48a5e78fb]
> Candidate host 'alma03.qa.lab.tlv.redhat.com'
> ('8c70eb98-b950-4d73-9782-38b1bdbc9afc') was filtered out by
> 'VAR__FILTERTYPE__INTERNAL' filter 'Memory' (correlation id:
> 4ebcedba-fa0f-47c5-9519-b4b48a5e78fb)
> 2017-03-06 17:44:13,209+02 WARN 
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (org.ovirt.thread.pool-6-thread-14) [4ebcedba-fa0f-47c5-9519-b4b48a5e78fb]
> EVENT_ID: VM_MIGRATION_NO_VDS_TO_MIGRATE_TO(166), Correlation ID:
> 4ebcedba-fa0f-47c5-9519-b4b48a5e78fb, Job ID:
> dced6cbd-e253-4889-8be7-75b9bedc66ad, Call Stack: null, Custom Event ID: -1,
> Message: No available host was found to migrate VM HostedEngine to.
> 
> 
> That is definitely by design and not a bug in this case. But please check
> where the logs are coming from, because you used different hostnames in
> comment #17.

Hosts are different now as I've reproduced on another hosts pair, almas are my new clean deployment, pumas are not available now. 

HE-VM should have memory reservation, we have verified the RFE for it here: https://bugzilla.redhat.com/show_bug.cgi?id=1264085

I've reproduced the exact steps from https://bugzilla.redhat.com/show_bug.cgi?id=1411319#c0 and the expected result is to be able to migrate HE-VM from a to b and then without any delay back to a, without any issues.

Comment 25 Martin Sivák 2017-03-07 09:15:55 UTC

Hosted engine memory reservation needs to be enabled (HeSparesCount needs to be set to something else than 0). But it won't help in this case, because the engine only refreshes stats once per 15 seconds. And the old-source new-destination host looks occupied until the used memory is refreshed.

Since I do not see any score reduction to 0 in your reproducer I consider this bug to be still VERIFIED. The root cause of the original report (see Additional info from #c0 and the Doc Text field) is therefore still fixed.

What you want (immediate update of available memory after VM migration/shutdown) is something different and can be tracked in a new bug (it will also belong to the virt team).

Comment 26 Nikolai Sednev 2017-03-07 16:32:56 UTC

Host alma04 dropped it's score to zero as appears bellow when I've migrated HE-VM as you may see bellow:




[root@alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 9f3ccded
local_conf_timestamp               : 19051
Host timestamp                     : 19035
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=19035 (Tue Mar  7 18:12:34 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=19051 (Tue Mar  7 18:12:49 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUp
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 8b0ccf9f
local_conf_timestamp               : 3721
Host timestamp                     : 3706
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=3706 (Tue Mar  7 18:12:27 2017)
        host-id=2
        score=3400
        vm_conf_refresh_time=3721 (Tue Mar  7 18:12:43 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False
You have new mail in /var/spool/mail/root
[root@alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "migration source"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 597bd555
local_conf_timestamp               : 19204
Host timestamp                     : 19187
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=19187 (Tue Mar  7 18:15:06 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=19204 (Tue Mar  7 18:15:23 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineMigratingAway
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 2
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "migration destination"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : bfb734b4
local_conf_timestamp               : 3877
Host timestamp                     : 3858
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=3858 (Tue Mar  7 18:14:59 2017)
        host-id=2
        score=3400
        vm_conf_refresh_time=3877 (Tue Mar  7 18:15:18 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False
[root@alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "migration source"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 597bd555
local_conf_timestamp               : 19204
Host timestamp                     : 19187
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=19187 (Tue Mar  7 18:15:06 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=19204 (Tue Mar  7 18:15:23 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineMigratingAway
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 2
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "migration destination"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : bfb734b4
local_conf_timestamp               : 3877
Host timestamp                     : 3858
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=3858 (Tue Mar  7 18:14:59 2017)
        host-id=2
        score=3400
        vm_conf_refresh_time=3877 (Tue Mar  7 18:15:18 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False
[root@alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "migration source"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 597bd555
local_conf_timestamp               : 19204
Host timestamp                     : 19187
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=19187 (Tue Mar  7 18:15:06 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=19204 (Tue Mar  7 18:15:23 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineMigratingAway
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 2
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "migration destination"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 7baa7fec
local_conf_timestamp               : 3907
Host timestamp                     : 3892
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=3892 (Tue Mar  7 18:15:33 2017)
        host-id=2
        score=3400
        vm_conf_refresh_time=3907 (Tue Mar  7 18:15:49 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False
[root@alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 1
Engine status                      : {"health": "good", "vm": "up", "detail": "migration source"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 597bd555
local_conf_timestamp               : 19204
Host timestamp                     : 19187
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=19187 (Tue Mar  7 18:15:06 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=19204 (Tue Mar  7 18:15:23 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineMigratingAway
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 2
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "migration destination"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 7baa7fec
local_conf_timestamp               : 3907
Host timestamp                     : 3892
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=3892 (Tue Mar  7 18:15:33 2017)
        host-id=2
        score=3400
        vm_conf_refresh_time=3907 (Tue Mar  7 18:15:49 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False
[root@alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : bc5c26aa
local_conf_timestamp               : 19240
Host timestamp                     : 19219
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=19219 (Tue Mar  7 18:15:38 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=19240 (Tue Mar  7 18:15:58 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 2
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "migration destination"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 7baa7fec
local_conf_timestamp               : 3907
Host timestamp                     : 3892
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=3892 (Tue Mar  7 18:15:33 2017)
        host-id=2
        score=3400
        vm_conf_refresh_time=3907 (Tue Mar  7 18:15:49 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False
[root@alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : bc5c26aa
local_conf_timestamp               : 19240
Host timestamp                     : 19219
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=19219 (Tue Mar  7 18:15:38 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=19240 (Tue Mar  7 18:15:58 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 2
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "migration destination"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 7baa7fec
local_conf_timestamp               : 3907
Host timestamp                     : 3892
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=3892 (Tue Mar  7 18:15:33 2017)
        host-id=2
        score=3400
        vm_conf_refresh_time=3907 (Tue Mar  7 18:15:49 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False
[root@alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : bc5c26aa
local_conf_timestamp               : 19240
Host timestamp                     : 19219
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=19219 (Tue Mar  7 18:15:38 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=19240 (Tue Mar  7 18:15:58 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 2
Engine status                      : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "migration destination"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 7baa7fec
local_conf_timestamp               : 3907
Host timestamp                     : 3892
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=3892 (Tue Mar  7 18:15:33 2017)
        host-id=2
        score=3400
        vm_conf_refresh_time=3907 (Tue Mar  7 18:15:49 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False
[root@alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : bc5c26aa
local_conf_timestamp               : 19240
Host timestamp                     : 19219
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=19219 (Tue Mar  7 18:15:38 2017)
        host-id=1
        score=3400
        vm_conf_refresh_time=19240 (Tue Mar  7 18:15:58 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineDown
        stopped=False


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 189f5936
local_conf_timestamp               : 3938
Host timestamp                     : 3922
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=3922 (Tue Mar  7 18:16:04 2017)
        host-id=2
        score=3400
        vm_conf_refresh_time=3938 (Tue Mar  7 18:16:19 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineStarting
        stopped=False
[root@alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0
stopped                            : False
Local maintenance                  : False
crc32                              : 80efaf72
local_conf_timestamp               : 19695
Host timestamp                     : 19680
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=19680 (Tue Mar  7 18:23:18 2017)
        host-id=1
        score=0
        vm_conf_refresh_time=19695 (Tue Mar  7 18:23:33 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUnexpectedlyDown
        stopped=False
        timeout=Thu Jan  1 07:32:26 1970


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : a0469f44
local_conf_timestamp               : 4331
Host timestamp                     : 4316
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=4316 (Tue Mar  7 18:22:37 2017)
        host-id=2
        score=3400
        vm_conf_refresh_time=4331 (Tue Mar  7 18:22:53 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUp
        stopped=False
You have new mail in /var/spool/mail/root
[root@alma03 ~]#

Comment 27 Nikolai Sednev 2017-03-07 16:36:10 UTC

	
Mar 7, 2017 6:15:44 PM
	
	
Migration failed (VM: HostedEngine, Source: alma03.qa.lab.tlv.redhat.com).

Attaching sosreports from both hosts.

Comment 28 Nikolai Sednev 2017-03-07 16:42:41 UTC

alma03:
https://drive.google.com/open?id=0B85BEaDBcF88bXEyTDhiM1VRcXc

alma04:
https://drive.google.com/open?id=0B85BEaDBcF88UklpZS05elZaRFE

Target host got it's score zeroed and migration failed.

Comment 29 Martin Sivák 2017-03-08 09:13:13 UTC

That is still something different, the original report says the source host reduced its score to 0 without any reason (the VM was still up). Your test failed, because the VM really crashed on the destination (vdsm.log):

2017-03-07 18:16:50,110+0200 ERROR (vm/a1abbc04) [virt.vm] (vmId='a1abbc04-00b1-4184-af5e-dc6ad3f184d1') The vm start process failed (vm:617)
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 553, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/virt/vm.py", line 2001, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 123, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 941, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 3777, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: resource busy: Failed to acquire lock: error -243
2017-03-07 18:16:50,116+0200 INFO  (vm/a1abbc04) [virt.vm] (vmId='a1abbc04-00b1-4184-af5e-dc6ad3f184d1') Changed state to Down: resource busy: Failed to acquire lock: error -243 (code=1) (vm:1202)

And hosted engine did exactly what it was supposed to do - blacklisted the host. This is again exactly what should have happened in this case (a real VM failure).

Moreover, your setup is not behaving properly, did you notice all the sanlock and vmstat errors in the log? Is your storage OK?

2017-03-07 17:01:07,490+0200 ERROR (tasks/7) [storage.TaskManager.Task] (Task='77b6247b-f5a3-434a-93c5-b5a77f426fcc') Unexpected error AcquireHostIdFailure: Cannot acquire host id: (u'5313e5df-d775-4d04-b153-6bc4e0b15e07', SanlockException(115, 'Sanlock lockspace add failure', 'Operation now in progress'))

2017-03-07 18:15:33,643+0200 INFO  (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call VM.getStats failed (error 1) in 0.00 seconds (__init__:515)

Comment 30 Nikolai Sednev 2017-03-08 10:17:36 UTC

If we're after "source host reduced its score to 0 without any reason", then here it is https://bugzilla.redhat.com/show_bug.cgi?id=1419326 and then this bug is a duplicate of it.

This very bug is about:
1. Migrate HE vm from host a to host b.
2. After successful migration again immediately migrate to host a.

Actual results:

immediate migration fails to host a.

Expected results:

Migration should be successful as we are manually migrating HE vm and there is no issue with HE host a.


I've followed the exact reproduction steps and failed, e.g. I've migrated from a to b and then immediately back to host a and failed.

What you're talking about is "Additional info"
"After migration Host a's Score is 0 and hence engine does not allow migration with error message:

The host a did not satisfy internal filter HA because it is not a Hosted Engine host.."

In my flow also host's "a" (alma04) score is zero as reported in https://bugzilla.redhat.com/show_bug.cgi?id=1411319#c26 e.g.

[root@alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma04
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 0
stopped                            : False
Local maintenance                  : False
crc32                              : 80efaf72
local_conf_timestamp               : 19695
Host timestamp                     : 19680
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=19680 (Tue Mar  7 18:23:18 2017)
        host-id=1
        score=0
        vm_conf_refresh_time=19695 (Tue Mar  7 18:23:33 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUnexpectedlyDown
        stopped=False
        timeout=Thu Jan  1 07:32:26 1970


--== Host 2 status ==--

conf_on_shared_storage             : True
Status up-to-date                  : True
Hostname                           : alma03
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : a0469f44
local_conf_timestamp               : 4331
Host timestamp                     : 4316
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=4316 (Tue Mar  7 18:22:37 2017)
        host-id=2
        score=3400
        vm_conf_refresh_time=4331 (Tue Mar  7 18:22:53 2017)
        conf_on_shared_storage=True
        maintenance=False
        state=EngineUp
        stopped=False
You have new mail in /var/spool/mail/root
[root@alma03 ~]#

Sanlock logs irrelevant as these were fixed and all hosts were fine with storage. This environment initially got restored from backup and met with https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c21 and also with https://bugzilla.redhat.com/show_bug.cgi?id=1417518#c9
So at the time I've tried to verify this bug, all sanlock issues were already fixed.

Comment 31 Martin Sivák 2017-03-08 13:04:56 UTC

No, the original report is about source host (host A or alma03 in your case from #26) 

> Migration failed (VM: HostedEngine, Source: alma03.qa.lab.tlv.redhat.com).

reducing its score to 0, because it didn't properly detect the finished migration (the VM disappeared, but was happily running on the destination).

In your case from comment #26 the VM really crashed (was no longer running anywhere) on the destination (host _B_) and alma04 reduced its score to 0 as it was supposed to.

That is something completely different. Also your system was reporting weird errors.

Just because something looks similar does not mean it is the same bug. Reproduce the exact situation or move this back to VERIFIED (and possibly open a new bug).

Comment 32 Nikolai Sednev 2017-03-08 13:42:06 UTC

This is original:
"
1. Migrate HE vm from host a to host b.
2. After successful migration again immediately migrate to host a."

First source is "a", then HE-VM was migrated successfully to its destination "b".
Second source is "b" and destination "a".


This is my case:
First source was alma04, destination was alma03 and HE-VM successfully migrated it's first migration.
Second immediate migration was from alma03 and it's destination was alma04 and migration had met with failure and alma04 had zero score. 

My case is equal to original in HE-VM migration end result - failure.

My environment was restored from backup as I already explained and also that issue was fixed, so I don't see how errors in logs influence migration flow.

The final result is the same, HE-VM can't immediately migrated between pair of hosted-engine-hosts, right after first migration.

Forth to your input I'll try to reproduce this bug for a fourth time and will open another bug to fix the new root cause.

Comment 33 Nikolai Sednev 2017-03-09 09:06:05 UTC

Created attachment 1261471 [details]
screencast-2017-03-09_10.58.20.mkv

Comment 34 Nikolai Sednev 2017-03-09 09:10:42 UTC

Created attachment 1261473 [details]
sosreport-puma18.scl.lab.tlv.redhat.com-20170309110039.tar.xz

Comment 35 Nikolai Sednev 2017-03-09 09:12:21 UTC

Created attachment 1261474 [details]
sosreport-puma19.scl.lab.tlv.redhat.com-20170309110051.tar.xz

Comment 36 Nikolai Sednev 2017-03-09 09:13:45 UTC

Created attachment 1261476 [details]
sosreport-nsednev-he-4.scl.lab.tlv.redhat.com-20170309110107.tar.xz

Comment 37 Nikolai Sednev 2017-03-09 09:22:21 UTC

Fully reproduced on clean and fresh environment with these components on hosts:
rhvm-appliance-4.1.20170221.0-1.el7ev.noarch
libvirt-client-2.0.0-10.el7_3.4.x86_64
mom-0.5.9-1.el7ev.noarch
ovirt-hosted-engine-ha-2.1.0.4-1.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-host-deploy-1.6.2-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0.4-1.el7ev.noarch
vdsm-4.19.7-1.el7ev.x86_64
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
ovirt-setup-lib-1.1.0-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Engine:
rhevm-doc-4.1.0-2.el7ev.noarch
rhevm-setup-plugins-4.1.1-1.el7ev.noarch
rhev-guest-tools-iso-4.1-4.el7ev.noarch
rhevm-branding-rhev-4.1.0-1.el7ev.noarch
rhevm-4.1.1.3-0.1.el7.noarch
rhevm-dependencies-4.1.1-1.el7ev.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Please see attached logs from both hosts, engine and screen cast from the reproduction.

I'll also reopening https://bugzilla.redhat.com/show_bug.cgi?id=1419326, please consider on which of these two bugs to be closed.

Comment 38 Martin Sivák 2017-03-09 14:51:30 UTC

> Second immediate migration was from alma03 and it's destination was alma04
> and migration had met with failure and alma04 had zero score. 

Exactly, the destination failed in that case. Not the source host. That means it was a different behaviour and a correct one from HE perspective (there might be an issue with sanlock or vdsm maybe.. but the score 0 was correct in this case).


> Forth to your input I'll try to reproduce this bug for a fourth time and
> will open another bug to fix the new root cause.

The last reproducer finally seems to be the proper one (_source_ host reducing score to 0). We can't solve 100% of it until we have event support in vdscli, but I guess we can improve the workarounds.

This only happens when the migration + destroy is faster than the current HE refresh period (because we can't detect the migration properly).


> I'll also reopening https://bugzilla.redhat.com/show_bug.cgi?id=1419326, please 
> consider on which of these two bugs to be closed.

The product is different on those, this one is RHV, the other one is oVirt.

Comment 39 Martin Sivák 2017-03-17 13:46:57 UTC

Andrej improved the detection and added a workaround for case when the migration is really fast (less than 10 seconds).

Comment 41 Nikolai Sednev 2017-03-26 15:43:36 UTC

Moving this bug to verified as there is no more zero score issue and HA is working better now on these host's components:
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
ovirt-setup-lib-1.1.0-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-vmconsole-1.0.4-1.el7ev.noarch
vdsm-4.19.10-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.1.0.5-1.el7ev.noarch
ovirt-host-deploy-1.6.3-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.8.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0.5-1.el7ev.noarch
libvirt-client-2.0.0-10.el7_3.5.x86_64
mom-0.5.9-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
Linux version 3.10.0-514.16.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017
Linux 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Engine:
rhevm-doc-4.1.0-2.el7ev.noarch
rhev-guest-tools-iso-4.1-4.el7ev.noarch
rhevm-4.1.1.6-0.1.el7.noarch
rhevm-branding-rhev-4.1.0-1.el7ev.noarch
rhevm-setup-plugins-4.1.1-1.el7ev.noarch
rhevm-dependencies-4.1.1-1.el7ev.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)


I've tested this bug for HE deployed on NFS storage domain and pair of NFS data storage domains.
 
Works for me on rhevm-4.1.1.6-0.1.el7.noarch.