Bug 1362618
Summary: | [HE] sometimes move HE host to maintenance stuck in "Preparing For Maintenance" status. | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-hosted-engine-ha | Reporter: | Kobi Hakimi <khakimi> | ||||||||||||||||||||||||
Component: | Agent | Assignee: | Martin Sivák <msivak> | ||||||||||||||||||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Nikolai Sednev <nsednev> | ||||||||||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||||||||||
Priority: | high | ||||||||||||||||||||||||||
Version: | 2.0.1 | CC: | ahadas, akrejcir, alukiano, bugs, dfediuck, jbelka, khakimi, mavital, michal.skrivanek, msivak, mwest, ncredi, nsednev, ylavi | ||||||||||||||||||||||||
Target Milestone: | ovirt-4.0.6 | Keywords: | Automation, TestOnly, Triaged | ||||||||||||||||||||||||
Target Release: | --- | Flags: | rule-engine:
ovirt-4.0.z+
ylavi: planning_ack+ msivak: devel_ack+ mavital: testing_ack+ |
||||||||||||||||||||||||
Hardware: | Unspecified | ||||||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||||||
Clone Of: | |||||||||||||||||||||||||||
: | 1391933 (view as bug list) | Environment: | |||||||||||||||||||||||||
Last Closed: | 2016-12-14 10:17:21 UTC | Type: | Bug | ||||||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||||||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||||||
Bug Depends On: | 1380822, 1391933, 1392903 | ||||||||||||||||||||||||||
Bug Blocks: | |||||||||||||||||||||||||||
Attachments: |
|
Created attachment 1186869 [details]
Hosts tab
Created attachment 1186870 [details]
Virtual Machines tab
In the host tab the machine is still migrating. I'll need the engine and the vdsm log from both hosts. All in all it doesn't look wrong atm. How did your system look say a day? In 4.0 the agent code should take advantage of migration improvements and trigger one of the new migration policies instead of the default 3.6 behavior which often fails to converge for busy VMs like HE. Created attachment 1189591 [details]
engine and vdsm logs
ha_logs/
ha_logs/vm-23_engine.log - HE - engine log
ha_logs/puma23_vdsm.log - in the beginning move host_mixed_1(with HE vm) to maintenance - succeeded
ha_logs/puma26_vdsm.log - then move host_mixed_2 (with HE vm) to maintenance
ha_logs/puma27_vdsm.log - moved to host_mixed_3 but still host_mixed_2 stuck in preparing for maintenance
I have this bug reproduced on my HE environment now. if someone want to investigate it ASAP please let me know. (In reply to Jiri Belka from comment #7) > Is this one related or... - BZ1365470 ? no, its and engine 4.0 with 3.6 hosts causing other problems (In reply to Jiri Belka from comment #7) > Is this one related or... - BZ1365470 ? no, its an engine 4.0 with 3.6 hosts causing other problems more info: status: HE vm up and running on puma23 puma26 and puma27 with HE capabilities up and running with HA score(3400) - after restart I moved puma23 to maintenance and it worked as expected status: puma23 in maintenance HE vm up and running on puma26 puma27 up and running with HA score(3400) I moved puma26 to maintenance and it stuck in preparing for maintenance status status: puma23 in maintenance puma26 in preparing for maintenance status - with the HE vm on it puma27 up and running with HA score(3400) you can see attached logs from puma26 and puma27(puma_src_dst_logs.tar.gz). Created attachment 1191620 [details]
puma_src_dst_logs.tar.gz
So I see the issue here: The VM was up and running.. MainThread::INFO::2016-08-17 14:48:19,594::hosted_engine::461::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineUp (score: 3400) Detected local maintenance MainThread::INFO::2016-08-17 14:48:29,669::state_decorators::124::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(check) Local maintenance detected Performed all the right steps MainThread::INFO::2016-08-17 14:48:29,673::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1471434509.67 type=state_transition detail=EngineUp-LocalMaintenanceMigrateVm hostname='puma26.scl.lab.tlv.redhat.com' MainThread::INFO::2016-08-17 14:48:52,789::hosted_engine::461::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state LocalMaintenanceMigrateVm (score: 0) MainThread::INFO::2016-08-17 14:48:52,879::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1471434532.88 type=state_transition detail=LocalMaintenanceMigrateVm-EngineMigratingAway hostname='puma26.scl.lab.tlv.redhat.com' Then stayed in the migration status for a while MainThread::INFO::2016-08-17 14:49:21,236::hosted_engine::461::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineMigratingAway (score: 3400) And then it choked.. MainThread::INFO::2016-08-17 14:49:21,236::hosted_engine::461::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Current state EngineMigratingAway (score: 3400) MainThread::INFO::2016-08-17 14:49:21,237::hosted_engine::466::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Best remote host puma23.scl.lab.tlv.redhat.com (id: 1, score: 0) MainThread::INFO::2016-08-17 14:49:31,362::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1471434571.36 type=state_transition detail=EngineMigratingAway-ReinitializeFSM hostname='puma26.scl.lab.tlv.redhat.com' Was the VDSM migration verb invoked and did it start the migration? This bug was possibly fixed in bug 1380822. According to vdsm logs from the source, the migration could fail when accessing a nonexisting key in a dictionary. This patch probably fixes it: https://gerrit.ovirt.org/#/c/64499 Please test as part of https://bugzilla.redhat.com/show_bug.cgi?id=1380822 Created attachment 1216062 [details]
logs(you can start watching from the date 2016-11-01 11:11:39)
Checked on:
ovirt-hosted-engine-setup-2.0.3-2.el7ev.noarch
ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch
vdsm-xmlrpc-4.18.15.2-1.el7ev.noarch
vdsm-hook-vmfex-dev-4.18.15.2-1.el7ev.noarch
vdsm-api-4.18.15.2-1.el7ev.noarch
vdsm-cli-4.18.15.2-1.el7ev.noarch
vdsm-yajsonrpc-4.18.15.2-1.el7ev.noarch
vdsm-4.18.15.2-1.el7ev.x86_64
vdsm-infra-4.18.15.2-1.el7ev.noarch
vdsm-jsonrpc-4.18.15.2-1.el7ev.noarch
vdsm-python-4.18.15.2-1.el7ev.noarch
Steps:
1) Put host_1 with the HE VM to the maintenance - PASS HE VM succeeds to migrate on the host_2
2) Activate host_1
3) Put host_2 with the HE VM to the maintenance - PASS HE VM succeeds to migrate on the host_1
4) Activate host_2
5) Put host_1 with the HE VM to the maintenance - FAILS
From the command `hosted-engine --vm-status` I can see that action succeeds without any troubles(host_1 in the local maintenance and HE VM run on the host_2), but UI shows me that host_1 stuck in preparing to maintenance state
Check logs for more information, also you can find screenshots under the archive
Artyom, where is the engine running after step 5 according to the webadmin? Also on host _2? According to the engine, HE VM placed on the host_1 from the VM menu and placed on the host_2 from the host menu. You can check screenshots from the log archive to acquire more information. Ok so the status in the failing case is this: Engine thinks this: Host 1 - Preparing for maintenance, 0 VMs Host 3 - Up, 1 VM Hosted engine VM - migrating from Host 1 The reality: The VM is physically running on Host 3 already and hosted engine tooling knows that. So the engine somehow missed the migration finished event. I only saw one suspicious entry in the log: 2016-11-01 11:53:34,536 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (ForkJoinPool-1-worker-0) [] VM 'a841f18e-dbad-47e9-ab bd-bb97cdc86500'(HostedEngine) moved from 'MigratingFrom' --> 'Down' 2016-11-01 11:53:34,537 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-0) [] Failed during vms monitoring on host host_mixed_1 error is: java.lang.NullPointerException 2016-11-01 11:53:34,537 ERROR [org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring] (ForkJoinPool-1-worker-0) [] Exception:: java.lang.NullPointerException at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.isVdsNonResponsive(VmAnalyzer.java:760) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.handOverVm(VmAnalyzer.java:739) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.proceedDownVm(VmAnalyzer.java:267) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer.analyze(VmAnalyzer.java:154) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.lambda$analyzeVms$1(VmsMonitoring.java:161) [vdsbroker.jar:] at java.util.ArrayList.forEach(ArrayList.java:1249) [rt.jar:1.8.0_102] at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.analyzeVms(VmsMonitoring.java:156) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.VmsMonitoring.perform(VmsMonitoring.java:119) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.EventVmStatsRefresher$1.onNext(EventVmStatsRefresher.java:61) [vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.monitoring.EventVmStatsRefresher$1.onNext(EventVmStatsRefresher.java:46) [vdsbroker.jar:] at org.ovirt.vdsm.jsonrpc.client.events.EventPublisher$EventCallable.call(EventPublisher.java:120) [vdsm-jsonrpc-java-client.jar:] at org.ovirt.vdsm.jsonrpc.client.events.EventPublisher$EventCallable.call(EventPublisher.java:95) [vdsm-jsonrpc-java-client.jar:] at java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1424) [rt.jar:1.8.0_102] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [rt.jar:1.8.0_102] at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [rt.jar:1.8.0_102] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [rt.jar:1.8.0_102] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) [rt.jar:1.8.0_102] Some findings: 1. There is a problematic video device ('device' is empty): { "address": { "bus": "0x00", "domain": "0x0000", "function": "0x0", "slot": "0x02", "type": "pci" }, "alias": "video0", "device": "", "type": "video" } Kobi: 1.1 please file a bug for SLA to see if it always happens with hosted-engine VMs 1.2 please file a bug for virt to handle such devices properly and prevent the NPE. 2. There's the known exception during the hand-over for unmanaged migrations - it would be great to finally think about a way to get the destination host from VDSM and prevent the NPE. 3. The VM is now locked by the monitoring because of the exception at 11:53:33. On master there's a fix for this (https://gerrit.ovirt.org/#/c/61614/) - it should be backported, at least some of it. (In reply to Roy Golan from comment #14) > Was the VDSM migration verb invoked and did it start the migration? <rant> There's your problem right there. It should use the proper API which is the engine REST API and nothing else. You would actually get comment #4 for free, but since you're invoking internal API you don't get it for free and the worst migration policy is used </rant> (In reply to Arik from comment #21) > 3. The VM is now locked by the monitoring because of the exception at > 11:53:33. On master there's a fix for this > (https://gerrit.ovirt.org/#/c/61614/) - it should be backported, at least > some of it. this patch would supposedly "fix" the problem of stuck monitoring as it would allow the destination host to update the VM status (Up) properly and the maintenance procedure would likely conclude. So definitely worth a backport. alternatively this bug is likely solved after a fix of bug 1392903 Based on comment 24, moving to test only once bug 1392903 is verified. *** Bug 1393686 has been marked as a duplicate of this bug. *** Moving this bug to verified as it no longer depends on other bugs and works as designed. Works for me on these components on hosts: libvirt-client-2.0.0-10.el7.x86_64 qemu-kvm-rhev-2.6.0-27.el7.x86_64 ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch ovirt-setup-lib-1.0.2-1.el7ev.noarch ovirt-imageio-daemon-0.4.0-0.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch mom-0.5.8-1.el7ev.noarch vdsm-4.18.15.3-1.el7ev.x86_64 ovirt-hosted-engine-setup-2.0.3-2.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 ovirt-host-deploy-1.5.3-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-imageio-common-0.3.0-0.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch Linux version 3.10.0-514.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Oct 19 11:24:13 EDT 2016 Linux 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) One engine: rhevm-setup-plugins-4.0.0.3-1.el7ev.noarch rhevm-4.0.6.1-0.1.el7ev.noarch rhevm-spice-client-x86-msi-4.0-3.el7ev.noarch rhev-release-4.0.6-3-001.noarch rhevm-dependencies-4.0.0-1.el7ev.noarch rhev-guest-tools-iso-4.0-6.el7ev.noarch rhevm-spice-client-x64-msi-4.0-3.el7ev.noarch rhevm-branding-rhev-4.0.0-5.el7ev.noarch rhevm-guest-agent-common-1.0.12-3.el7ev.noarch rhevm-doc-4.0.6-1.el7ev.noarch Linux version 3.10.0-514.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Oct 19 11:24:13 EDT 2016 Linux 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) I've deployed a clean deployment of rhevm-appliance-20161116.0-1.el7ev.noarch (el7.3 based) over NFS, then upgraded the engine to latest components as appears above, then added NFS data storage domain to get hosted_storage auto-imported in to the engine's WEBUI. Then added additional hosed engine host via WEBUI. Made at least 8 iterations of steps 1-10: 1.HE-VM running on alma03 with ha score 3400. 2.Setting alma03 in to the maintenance via WEBUI. 3.HE-VM migrated to alma04 successfully and alma03 got to maintenance with ha score of 0 and seen in local maintenance from CLI and WEBUI. 4.Activated alma03 back without any issues and host went back active. 5.Waited for alma03 to get ha score of 3400 a few minutes. 6.HE-VM running on alma04 with ha score 3400. 7.Setting alma04 in to the maintenance via WEBUI. 8.HE-VM migrated to alma03 successfully and alma04 got to maintenance with ha score of 0 and seen in local maintenance from CLI and WEBUI. 9.Activated alma04 back without any issues and host went back active. 10.Waited for alma04 to get ha score of 3400 a few minutes. I did not gotten in to the initally reported issue, hence moving this bug to verified. Moving back to assigned as 1391933 was eventually reproduced. Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. Changing back to assigned, as bug was eventually reproduced after ~10th iteration, when I did not waited for target host to become active with positive score, I've tried to set host with HE-VM on it (alma04) in to maintenance, while alma03 was active, but not in positive HA score. attaching sosreports from my environment. Created attachment 1225895 [details]
sosreport from engine
Created attachment 1225896 [details]
sosreport from alma03
Created attachment 1225897 [details]
sosreport from alma04
Created attachment 1225900 [details]
Screen shot of hosts tab alma04 stuck in preparing to maintenance.
Created attachment 1225902 [details]
Virtual machines tab screen shot
Adding also CLI hosted-engine --vm-status from both hosts here: [root@alma04 ~]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : alma03.qa.lab.tlv.redhat.com Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : ac02a377 Host timestamp : 92207 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=92207 (Tue Nov 29 17:32:44 2016) host-id=1 score=3400 maintenance=False state=EngineDown stopped=False --== Host 2 status ==-- Status up-to-date : True Hostname : alma04.qa.lab.tlv.redhat.com Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 0 stopped : False Local maintenance : True crc32 : a368838b Host timestamp : 85789 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=85789 (Tue Nov 29 17:32:40 2016) host-id=2 score=0 maintenance=True state=LocalMaintenance stopped=False [root@alma03 ~]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : alma03.qa.lab.tlv.redhat.com Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 4e06b25f Host timestamp : 92257 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=92257 (Tue Nov 29 17:33:35 2016) host-id=1 score=3400 maintenance=False state=EngineDown stopped=False --== Host 2 status ==-- Status up-to-date : True Hostname : alma04.qa.lab.tlv.redhat.com Host ID : 2 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 0 stopped : False Local maintenance : True crc32 : 6e96fd3e Host timestamp : 85835 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=85835 (Tue Nov 29 17:33:26 2016) host-id=2 score=0 maintenance=True state=LocalMaintenance stopped=False Martin - can it be deferred to 4.0.7? (In reply to Yaniv Kaul from comment #39) > Martin - can it be deferred to 4.0.7? Deferring. I think we decided to leave it as it is for 4.0.6 and track 1399766 separately for 4.0.7. Moving back to ON_QA. (In reply to Martin Sivák from comment #41) > I think we decided to leave it as it is for 4.0.6 and track 1399766 > separately for 4.0.7. Moving back to ON_QA. If so, then I have nothing to do with this bug, as except for 1399766, I did not seen this issue being reproduced on my environments, not on upstream, neither on downstream environments. For upstream environment I've used these components on hosts: ovirt-vmconsole-host-1.0.4-1.el7ev.noarch ovirt-release41-pre-4.1.0-0.0.beta.20161201085255.git731841c.el7.centos.noarch ovirt-hosted-engine-setup-2.1.0-0.0.master.20161130101611.gitb3ad261.el7.centos.noarch libvirt-client-2.0.0-10.el7_3.2.x86_64 ovirt-engine-appliance-4.1-20161202.1.el7.centos.noarch ovirt-host-deploy-1.6.0-0.0.master.20161107121647.gitfd7ddcd.el7.centos.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-imageio-daemon-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch ovirt-hosted-engine-ha-2.1.0-0.0.master.20161130135331.20161130135328.git3541725.el7.centos.noarch ovirt-setup-lib-1.1.0-0.0.master.20161107100014.gitb73abeb.el7.centos.noarch ovirt-imageio-common-0.5.0-0.201611201242.gitb02532b.el7.centos.noarch vdsm-4.18.999-1020.git1ff41b1.el7.centos.x86_64 sanlock-3.4.0-1.el7.x86_64 qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64 ovirt-vmconsole-1.0.4-1.el7ev.noarch mom-0.5.8-1.el7ev.noarch Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016 Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) On engine: ovirt-engine-4.1.0-0.2.master.20161210231201.git26a385e.el7.centos.noarch vdsm-jsonrpc-java-1.3.5-1.20161209104906.gitabdea80.el7.centos.noarch Linux version 3.10.0-327.36.3.el7.x86_64 (builder.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Oct 24 16:09:20 UTC 2016 Linux 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux CentOS Linux release 7.2.1511 (Core) For downstream I've used these components on hosts: vdsm-4.18.18-4.git198e48d.el7ev.x86_64 ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-imageio-daemon-0.4.0-0.el7ev.noarch rhev-release-4.0.6-5-001.noarch sanlock-3.4.0-1.el7.x86_64 ovirt-vmconsole-host-1.0.4-1.el7ev.noarch qemu-kvm-rhev-2.6.0-28.el7_3.2.x86_64 ovirt-hosted-engine-ha-2.0.6-1.el7ev.noarch rhevm-appliance-20161130.0-1.el7ev.noarch ovirt-setup-lib-1.0.2-1.el7ev.noarch ovirt-imageio-common-0.3.0-0.el7ev.noarch libvirt-client-2.0.0-10.el7_3.2.x86_64 ovirt-vmconsole-1.0.4-1.el7ev.noarch mom-0.5.8-1.el7ev.noarch ovirt-hosted-engine-setup-2.0.4.1-2.el7ev.noarch ovirt-host-deploy-1.5.3-1.el7ev.noarch Linux version 3.10.0-514.2.2.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Nov 16 13:15:13 EST 2016 Linux 3.10.0-514.2.2.el7.x86_64 #1 SMP Wed Nov 16 13:15:13 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) On engine: rhevm-dependencies-4.0.0-1.el7ev.noarch rhevm-spice-client-x86-msi-4.0-3.el7ev.noarch rhev-release-4.0.6-5-001.noarch rhevm-spice-client-x64-msi-4.0-3.el7ev.noarch rhevm-doc-4.0.6-1.el7ev.noarch rhevm-4.0.6.3-0.1.el7ev.noarch rhev-guest-tools-iso-4.0-6.el7ev.noarch rhevm-branding-rhev-4.0.0-6.el7ev.noarch rhevm-setup-plugins-4.0.0.3-1.el7ev.noarch rhevm-guest-agent-common-1.0.12-3.el7ev.noarch Linux version 3.10.0-514.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Wed Oct 19 11:24:13 EDT 2016 Linux 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) Moving to verified, as there is no more dependent bugs remains. Because this bug is too general and not 100% reproduced it could be the same as: https://bugzilla.redhat.com/show_bug.cgi?id=1399766 which has specific scenario and it DOESN'T FIXED in 4.0.6 *** This bug has been marked as a duplicate of bug 1399766 *** |
Created attachment 1186868 [details] agent.log Description of problem: [HE] sometimes move HE host to maintenance stuck in "Preparing For Maintenance" status. Version-Release number of selected component (if applicable): Red Hat Virtualization Manager Version: 4.0.2.2-0.1.el7ev rhevm-appliance-20160731.0-1.el7ev.noarch How reproducible: 75% Steps to Reproduce: 1. Deploy HE on more than one host(in our case GE-he with 3 hosts with HE capabilities) 2. Open the Hosts tab and click on the HE host. 3. Click on maintenance. Actual results: The HE Host stuck in "Preparing For Maintenance" status The Virtual Machines counter of this host reduce to 0(zero) The Virtual Machines counter of other host increased in one. In Virtual Machines tab the HostedEngine vm still tight to the first host. you can see in the attached snapshots. Expected results: To move to maintenance and migrate the HE vm to other host Additional info: see attached log file: agent.log