Bug 1362618

Summary: [HE] sometimes move HE host to maintenance stuck in "Preparing For Maintenance" status.
Product: [oVirt] ovirt-hosted-engine-ha Reporter: Kobi Hakimi <khakimi>
Component: AgentAssignee: Martin Sivák <msivak>
Status: CLOSED DUPLICATE QA Contact: Nikolai Sednev <nsednev>
Severity: high Docs Contact:
Priority: high    
Version: 2.0.1CC: ahadas, akrejcir, alukiano, bugs, dfediuck, jbelka, khakimi, mavital, michal.skrivanek, msivak, mwest, ncredi, nsednev, ylavi
Target Milestone: ovirt-4.0.6Keywords: Automation, TestOnly, Triaged
Target Release: ---Flags: rule-engine: ovirt-4.0.z+
ylavi: planning_ack+
msivak: devel_ack+
mavital: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1391933 (view as bug list) Environment:
Last Closed: 2016-12-14 10:17:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1380822, 1391933, 1392903    
Bug Blocks:    
Attachments:
Description Flags
agent.log
none
Hosts tab
none
Virtual Machines tab
none
engine and vdsm logs
none
puma_src_dst_logs.tar.gz
none
logs(you can start watching from the date 2016-11-01 11:11:39)
none
sosreport from engine
none
sosreport from alma03
none
sosreport from alma04
none
Screen shot of hosts tab alma04 stuck in preparing to maintenance.
none
Virtual machines tab screen shot none

Description Kobi Hakimi 2016-08-02 16:28:00 UTC
Created attachment 1186868 [details]
agent.log

Description of problem:
[HE] sometimes move HE host to maintenance stuck in "Preparing For Maintenance" status.  

Version-Release number of selected component (if applicable):
Red Hat Virtualization Manager Version: 4.0.2.2-0.1.el7ev 
rhevm-appliance-20160731.0-1.el7ev.noarch

How reproducible:
75%

Steps to Reproduce:
1. Deploy HE on more than one host(in our case GE-he with 3 hosts with HE capabilities)
2. Open the Hosts tab and click on the HE host.
3. Click on maintenance.


Actual results:
The HE Host stuck in "Preparing For Maintenance" status 
The Virtual Machines counter of this host reduce to 0(zero)
The Virtual Machines counter of other host increased in one.
In Virtual Machines tab the HostedEngine vm still tight to the first host.
you can see in the attached snapshots.

Expected results:
To move to maintenance and migrate the HE vm to other host

Additional info:
see attached log file: agent.log

Comment 1 Kobi Hakimi 2016-08-02 16:29:18 UTC
Created attachment 1186869 [details]
Hosts tab

Comment 2 Kobi Hakimi 2016-08-02 16:29:52 UTC
Created attachment 1186870 [details]
Virtual Machines tab

Comment 3 Roy Golan 2016-08-10 08:18:46 UTC
In the host tab the machine is still migrating. I'll need the engine and the vdsm log from both hosts. All in all it doesn't look wrong atm. How did your system look say a day?

Comment 4 Michal Skrivanek 2016-08-10 11:19:18 UTC
In 4.0 the agent code should take advantage of migration improvements and trigger one of the new migration policies instead of the default 3.6 behavior which often fails to converge for busy VMs like HE.

Comment 5 Kobi Hakimi 2016-08-10 12:02:56 UTC
Created attachment 1189591 [details]
engine and vdsm logs

ha_logs/
ha_logs/vm-23_engine.log - HE - engine log
ha_logs/puma23_vdsm.log - in the beginning move host_mixed_1(with HE vm) to maintenance - succeeded
ha_logs/puma26_vdsm.log - then move host_mixed_2 (with HE vm) to maintenance 
ha_logs/puma27_vdsm.log - moved to host_mixed_3 but still host_mixed_2 stuck in preparing for maintenance

Comment 6 Kobi Hakimi 2016-08-10 12:10:11 UTC
I have this bug reproduced on my HE environment now.
if someone want to investigate it ASAP please let me know.

Comment 7 Jiri Belka 2016-08-10 12:37:58 UTC
Is this one related or... - BZ1365470 ?

Comment 8 Roy Golan 2016-08-17 09:22:56 UTC
(In reply to Jiri Belka from comment #7)
> Is this one related or... - BZ1365470 ?

no, its and engine 4.0 with 3.6 hosts causing other problems

Comment 9 Roy Golan 2016-08-17 09:23:19 UTC
(In reply to Jiri Belka from comment #7)
> Is this one related or... - BZ1365470 ?

no, its an engine 4.0 with 3.6 hosts causing other problems

Comment 10 Kobi Hakimi 2016-08-17 12:27:10 UTC
more info:
status: 
HE vm up and running on puma23 
puma26 and puma27 with HE capabilities up and running with HA score(3400) - after restart

I moved puma23 to maintenance and it worked as expected 

status: 
puma23 in maintenance
HE vm up and running on puma26 
puma27 up and running with HA score(3400)

I moved puma26 to maintenance and it stuck in preparing for maintenance status 

status: 
puma23 in maintenance
puma26 in preparing for maintenance status - with the HE vm on it
puma27 up and running with HA score(3400)

you can see attached logs from puma26 and puma27(puma_src_dst_logs.tar.gz).

Comment 11 Kobi Hakimi 2016-08-17 12:28:14 UTC
Created attachment 1191620 [details]
puma_src_dst_logs.tar.gz

Comment 13 Martin Sivák 2016-09-07 11:12:33 UTC
So I see the issue here:

The VM was up and running..

MainThread::INFO::2016-08-17 14:48:19,594::hosted_engine::461::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 3400)

Detected local maintenance

MainThread::INFO::2016-08-17 14:48:29,669::state_decorators::124::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Local maintenance detected

Performed all the right steps

MainThread::INFO::2016-08-17 14:48:29,673::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1471434509.67 type=state_transition detail=EngineUp-LocalMaintenanceMigrateVm hostname='puma26.scl.lab.tlv.redhat.com'
MainThread::INFO::2016-08-17 14:48:52,789::hosted_engine::461::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state LocalMaintenanceMigrateVm (score: 0)
MainThread::INFO::2016-08-17 14:48:52,879::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1471434532.88 type=state_transition detail=LocalMaintenanceMigrateVm-EngineMigratingAway hostname='puma26.scl.lab.tlv.redhat.com'

Then stayed in the migration status for a while

MainThread::INFO::2016-08-17 14:49:21,236::hosted_engine::461::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineMigratingAway (score: 3400)

And then it choked..

MainThread::INFO::2016-08-17 14:49:21,236::hosted_engine::461::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineMigratingAway (score: 3400)
MainThread::INFO::2016-08-17 14:49:21,237::hosted_engine::466::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host puma23.scl.lab.tlv.redhat.com (id: 1, score: 0)
MainThread::INFO::2016-08-17 14:49:31,362::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1471434571.36 type=state_transition detail=EngineMigratingAway-ReinitializeFSM hostname='puma26.scl.lab.tlv.redhat.com'

Comment 14 Roy Golan 2016-09-11 08:42:27 UTC
Was the VDSM migration verb invoked and did it start the migration?

Comment 15 Andrej Krejcir 2016-10-18 12:43:19 UTC
This bug was possibly fixed in bug 1380822.

According to vdsm logs from the source, the migration could fail when accessing a nonexisting key in a dictionary.
This patch probably fixes it: https://gerrit.ovirt.org/#/c/64499

Comment 16 Martin Sivák 2016-10-26 10:32:11 UTC
Please test as part of https://bugzilla.redhat.com/show_bug.cgi?id=1380822

Comment 17 Artyom 2016-11-01 10:22:43 UTC
Created attachment 1216062 [details]
logs(you can start watching from the date 2016-11-01 11:11:39)

Checked on:
ovirt-hosted-engine-setup-2.0.3-2.el7ev.noarch
ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch
vdsm-xmlrpc-4.18.15.2-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.18.15.2-1.el7ev.noarch
vdsm-api-4.18.15.2-1.el7ev.noarch
vdsm-cli-4.18.15.2-1.el7ev.noarch
vdsm-yajsonrpc-4.18.15.2-1.el7ev.noarch
vdsm-4.18.15.2-1.el7ev.x86_64
vdsm-infra-4.18.15.2-1.el7ev.noarch
vdsm-jsonrpc-4.18.15.2-1.el7ev.noarch
vdsm-python-4.18.15.2-1.el7ev.noarch

Steps:
1) Put host_1 with the HE VM to the maintenance - PASS HE VM succeeds to migrate on the host_2
2) Activate host_1
3) Put host_2 with the HE VM to the maintenance - PASS HE VM succeeds to migrate on the host_1
4) Activate host_2
5) Put host_1 with the HE VM to the maintenance - FAILS
From the command `hosted-engine --vm-status` I can see that action succeeds without any troubles(host_1 in the local maintenance and HE VM run on the host_2), but UI shows me that host_1 stuck in preparing to maintenance state

Check logs for more information, also you can find screenshots under the archive

Comment 18 Martin Sivák 2016-11-01 16:19:37 UTC
Artyom, where is the engine running after step 5 according to the webadmin? Also on host _2?

Comment 19 Artyom 2016-11-02 08:41:57 UTC
According to the engine, HE VM placed on the host_1 from the VM menu and placed on the host_2 from the host menu. You can check screenshots from the log archive to acquire more information.

Comment 20 Martin Sivák 2016-11-02 14:55:21 UTC
Ok so the status in the failing case is this:

Engine thinks this:

Host 1 - Preparing for maintenance, 0 VMs
Host 3 - Up, 1 VM

Hosted engine VM - migrating from Host 1

The reality:

The VM is physically running on Host 3 already and hosted engine tooling knows that.


So the engine somehow missed the migration finished event. I only saw one suspicious entry in the log:

2016-11-01 11:53:34,536 INFO  [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-0) [] VM 'a841f18e-dbad-47e9-ab
bd-bb97cdc86500'(HostedEngine) moved from 'MigratingFrom' --> 'Down'
2016-11-01 11:53:34,537 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-0) [] Failed during vms monitoring on host host_mixed_1 error is: java.lang.NullPointerException
2016-11-01 11:53:34,537 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-0) [] Exception:: java.lang.NullPointerException
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.isVdsNonResponsive(VmAnalyzer.java:760) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.handOverVm(VmAnalyzer.java:739) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.proceedDownVm(VmAnalyzer.java:267) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.java:154) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$analyzeVms$1(VmsMonitoring.java:161) [vdsbroker.jar:]
        at java.util.ArrayList.forEach(ArrayList.java:1249) [rt.jar:1.8.0_102]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(VmsMonitoring.java:156) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(VmsMonitoring.java:119) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.EventVmStatsRefresher$1.onNext(EventVmStatsRefresher.java:61) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.monitoring.EventVmStatsRefresher$1.onNext(EventVmStatsRefresher.java:46) [vdsbroker.jar:]
        at org.ovirt.vdsm.jsonrpc.client.events.EventPublisher$EventCallable.call(EventPublisher.java:120) [vdsm-jsonrpc-java-client.jar:]
        at org.ovirt.vdsm.jsonrpc.client.events.EventPublisher$EventCallable.call(EventPublisher.java:95) [vdsm-jsonrpc-java-client.jar:]
        at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1424) [rt.jar:1.8.0_102]
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [rt.jar:1.8.0_102]
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [rt.jar:1.8.0_102]
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [rt.jar:1.8.0_102]
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [rt.jar:1.8.0_102]

Comment 21 Arik 2016-11-02 16:58:27 UTC
Some findings:

1. There is a problematic video device ('device' is empty):
{
    "address": {
        "bus": "0x00",
        "domain": "0x0000",
        "function": "0x0",
        "slot": "0x02",
        "type": "pci"
    },
    "alias": "video0",
    "device": "",
    "type": "video"
}

Kobi:
1.1 please file a bug for SLA to see if it always happens with hosted-engine VMs
1.2 please file a bug for virt to handle such devices properly and prevent the NPE.

2. There's the known exception during the hand-over for unmanaged migrations - it would be great to finally think about a way to get the destination host from VDSM and prevent the NPE.

3. The VM is now locked by the monitoring because of the exception at 11:53:33. On master there's a fix for this (https://gerrit.ovirt.org/#/c/61614/) - it should be backported, at least some of it.

Comment 22 Michal Skrivanek 2016-11-02 17:03:17 UTC
(In reply to Roy Golan from comment #14)
> Was the VDSM migration verb invoked and did it start the migration?

<rant>

There's your problem right there. It should use the proper API which is the engine REST API and nothing else.
You would actually get comment #4 for free, but since you're invoking internal API you don't get it for free and the worst migration policy is used

</rant>

Comment 23 Michal Skrivanek 2016-11-02 17:06:24 UTC
(In reply to Arik from comment #21)
 
> 3. The VM is now locked by the monitoring because of the exception at
> 11:53:33. On master there's a fix for this
> (https://gerrit.ovirt.org/#/c/61614/) - it should be backported, at least
> some of it.

this patch would supposedly "fix" the problem of stuck monitoring as it would allow the destination host to update the VM status (Up) properly and the maintenance procedure would likely conclude. So definitely worth a backport.

Comment 24 Michal Skrivanek 2016-11-10 14:10:15 UTC
alternatively this bug is likely solved after a fix of bug 1392903

Comment 25 Doron Fediuck 2016-11-20 09:07:25 UTC
Based on comment 24, moving to test only once bug 1392903 is verified.

Comment 26 Doron Fediuck 2016-11-23 11:51:42 UTC
*** Bug 1393686 has been marked as a duplicate of this bug. ***

Comment 29 Nikolai Sednev 2016-11-29 14:48:15 UTC
Moving this bug to verified as it no longer depends on other bugs and works as designed.

Works for me on these components on hosts:
libvirt-client-2.0.0-10.el7.x86_64
qemu-kvm-rhev-2.6.0-27.el7.x86_64
ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
ovirt-imageio-daemon-0.4.0-0.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
vdsm-4.18.15.3-1.el7ev.x86_64
ovirt-hosted-engine-setup-2.0.3-2.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-host-deploy-1.5.3-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-common-0.3.0-0.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
Linux version 3.10.0-514.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Oct 19 11:24:13 EDT 2016
Linux 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

One engine:
rhevm-setup-plugins-4.0.0.3-1.el7ev.noarch
rhevm-4.0.6.1-0.1.el7ev.noarch
rhevm-spice-client-x86-msi-4.0-3.el7ev.noarch
rhev-release-4.0.6-3-001.noarch
rhevm-dependencies-4.0.0-1.el7ev.noarch
rhev-guest-tools-iso-4.0-6.el7ev.noarch
rhevm-spice-client-x64-msi-4.0-3.el7ev.noarch
rhevm-branding-rhev-4.0.0-5.el7ev.noarch
rhevm-guest-agent-common-1.0.12-3.el7ev.noarch
rhevm-doc-4.0.6-1.el7ev.noarch
Linux version 3.10.0-514.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Oct 19 11:24:13 EDT 2016
Linux 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

I've deployed a clean deployment of rhevm-appliance-20161116.0-1.el7ev.noarch (el7.3 based) over NFS, then upgraded the engine to latest components as appears above, then added NFS data storage domain to get hosted_storage auto-imported in to the engine's WEBUI. Then added additional hosed engine host via WEBUI.

Made at least 8 iterations of steps 1-10:
1.HE-VM running on alma03 with ha score 3400.
2.Setting alma03 in to the maintenance via WEBUI.
3.HE-VM migrated to alma04 successfully and alma03 got to maintenance with ha score of 0 and seen in local maintenance from CLI and WEBUI.
4.Activated alma03 back without any issues and host went back active.
5.Waited for alma03 to get ha score of 3400 a few minutes.
6.HE-VM running on alma04 with ha score 3400.
7.Setting alma04 in to the maintenance via WEBUI.
8.HE-VM migrated to alma03 successfully and alma04 got to maintenance with ha score of 0 and seen in local maintenance from CLI and WEBUI.
9.Activated alma04 back without any issues and host went back active.
10.Waited for alma04 to get ha score of 3400 a few minutes.

I did not gotten in to the initally reported issue, hence moving this bug to verified.

Comment 30 Nikolai Sednev 2016-11-29 15:06:29 UTC
Moving back to assigned as 1391933 was eventually reproduced.

Comment 31 Red Hat Bugzilla Rules Engine 2016-11-29 15:06:37 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 32 Nikolai Sednev 2016-11-29 15:26:34 UTC
Changing back to assigned, as bug was eventually reproduced after ~10th iteration, when I did not waited for target host to become active with positive score, I've tried to set host with HE-VM on it (alma04) in to maintenance, while alma03 was active, but not in positive HA score.
attaching sosreports from my environment.

Comment 33 Nikolai Sednev 2016-11-29 15:27:25 UTC
Created attachment 1225895 [details]
sosreport from engine

Comment 34 Nikolai Sednev 2016-11-29 15:28:43 UTC
Created attachment 1225896 [details]
sosreport from alma03

Comment 35 Nikolai Sednev 2016-11-29 15:30:04 UTC
Created attachment 1225897 [details]
sosreport from alma04

Comment 36 Nikolai Sednev 2016-11-29 15:31:16 UTC
Created attachment 1225900 [details]
Screen shot of hosts tab alma04 stuck in preparing to maintenance.

Comment 37 Nikolai Sednev 2016-11-29 15:32:31 UTC
Created attachment 1225902 [details]
Virtual machines tab screen shot

Comment 38 Nikolai Sednev 2016-11-29 15:34:19 UTC
Adding also CLI hosted-engine --vm-status from both hosts here:
[root@alma04 ~]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : alma03.qa.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : ac02a377
Host timestamp                     : 92207
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=92207 (Tue Nov 29 17:32:44 2016)
        host-id=1
        score=3400
        maintenance=False
        state=EngineDown
        stopped=False


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : alma04.qa.lab.tlv.redhat.com
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : a368838b
Host timestamp                     : 85789
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=85789 (Tue Nov 29 17:32:40 2016)
        host-id=2
        score=0
        maintenance=True
        state=LocalMaintenance
        stopped=False


[root@alma03 ~]# hosted-engine --vm-status


--== Host 1 status ==--

Status up-to-date                  : True
Hostname                           : alma03.qa.lab.tlv.redhat.com
Host ID                            : 1
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score                              : 3400
stopped                            : False
Local maintenance                  : False
crc32                              : 4e06b25f
Host timestamp                     : 92257
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=92257 (Tue Nov 29 17:33:35 2016)
        host-id=1
        score=3400
        maintenance=False
        state=EngineDown
        stopped=False


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : alma04.qa.lab.tlv.redhat.com
Host ID                            : 2
Engine status                      : {"health": "good", "vm": "up", "detail": "up"}
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : 6e96fd3e
Host timestamp                     : 85835
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=85835 (Tue Nov 29 17:33:26 2016)
        host-id=2
        score=0
        maintenance=True
        state=LocalMaintenance
        stopped=False

Comment 39 Yaniv Kaul 2016-12-01 13:35:21 UTC
Martin - can it be deferred to 4.0.7?

Comment 40 Yaniv Kaul 2016-12-08 13:51:00 UTC
(In reply to Yaniv Kaul from comment #39)
> Martin - can it be deferred to 4.0.7?

Deferring.

Comment 41 Martin Sivák 2016-12-12 10:53:12 UTC
I think we decided to leave it as it is for 4.0.6 and track 1399766 separately for 4.0.7. Moving back to ON_QA.

Comment 42 Nikolai Sednev 2016-12-12 12:01:20 UTC
(In reply to Martin Sivák from comment #41)
> I think we decided to leave it as it is for 4.0.6 and track 1399766
> separately for 4.0.7. Moving back to ON_QA.

If so, then I have nothing to do with this bug, as except for 1399766, I did not seen this issue being reproduced on my environments, not on upstream, neither on downstream environments.

For upstream environment I've used these components on hosts:
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
ovirt-release41-pre-4.1.0-0.0.beta.20161201085255.git731841c.el7.centos.noarch
ovirt-hosted-engine-setup-2.1.0-0.0.master.20161130101611.gitb3ad261.el7.centos.noarch
libvirt-client-2.0.0-10.el7_3.2.x86_64
ovirt-engine-appliance-4.1-20161202.1.el7.centos.noarch
ovirt-host-deploy-1.6.0-0.0.master.20161107121647.gitfd7ddcd.el7.centos.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-daemon-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
ovirt-hosted-engine-ha-2.1.0-0.0.master.20161130135331.20161130135328.git3541725.el7.centos.noarch
ovirt-setup-lib-1.1.0-0.0.master.20161107100014.gitb73abeb.el7.centos.noarch
ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch
vdsm-4.18.999-1020.git1ff41b1.el7.centos.x86_64
sanlock-3.4.0-1.el7.x86_64
qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64
ovirt-vmconsole-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

On engine:
ovirt-engine-4.1.0-0.2.master.20161210231201.git26a385e.el7.centos.noarch
vdsm-jsonrpc-java-1.3.5-1.20161209104906.gitabdea80.el7.centos.noarch
Linux version 3.10.0-327.36.3.el7.x86_64 (builder.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Oct 24 16:09:20 UTC 2016
Linux 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
CentOS Linux release 7.2.1511 (Core) 

For downstream I've used these components on hosts:
vdsm-4.18.18-4.git198e48d.el7ev.x86_64
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-imageio-daemon-0.4.0-0.el7ev.noarch
rhev-release-4.0.6-5-001.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64
ovirt-hosted-engine-ha-2.0.6-1.el7ev.noarch
rhevm-appliance-20161130.0-1.el7ev.noarch
ovirt-setup-lib-1.0.2-1.el7ev.noarch
ovirt-imageio-common-0.3.0-0.el7ev.noarch
libvirt-client-2.0.0-10.el7_3.2.x86_64
ovirt-vmconsole-1.0.4-1.el7ev.noarch
mom-0.5.8-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.4.1-2.el7ev.noarch
ovirt-host-deploy-1.5.3-1.el7ev.noarch
Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016
Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

On engine:
rhevm-dependencies-4.0.0-1.el7ev.noarch
rhevm-spice-client-x86-msi-4.0-3.el7ev.noarch
rhev-release-4.0.6-5-001.noarch
rhevm-spice-client-x64-msi-4.0-3.el7ev.noarch
rhevm-doc-4.0.6-1.el7ev.noarch
rhevm-4.0.6.3-0.1.el7ev.noarch
rhev-guest-tools-iso-4.0-6.el7ev.noarch
rhevm-branding-rhev-4.0.0-6.el7ev.noarch
rhevm-setup-plugins-4.0.0.3-1.el7ev.noarch
rhevm-guest-agent-common-1.0.12-3.el7ev.noarch
Linux version 3.10.0-514.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Oct 19 11:24:13 EDT 2016
Linux 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Moving to verified, as there is no more dependent bugs remains.

Comment 43 Kobi Hakimi 2016-12-14 10:17:21 UTC
Because this bug is too general and not 100% reproduced it could be the same as:
https://bugzilla.redhat.com/show_bug.cgi?id=1399766

which has specific scenario and it DOESN'T FIXED in 4.0.6

*** This bug has been marked as a duplicate of bug 1399766 ***