Description of problem: [HE] when run "hosted-engine --vm-start" get message "VM exists and is down, destroying it" Version-Release number of selected component (if applicable): Red Hat Virtualization Manager Version: 4.1.2.2-0.1.el7 rhvm-appliance-4.1.20170221.0-1.el7ev.noarch How reproducible: 100% Steps to Reproduce: On the host of which hosted the engine run the following commands: 1. "hosted-engine --set-maintenance --mode=global" 2. "hosted-engine --vm-shutdown" 3. wait few minutes for vm machine to shutdown 4. "hosted-engine --vm-start" Actual results: got the message: VM exists and is down, destroying it Expected results: to get the message: VM exists and is down, starting it Additional info: # hosted-engine --vm-status(after shutdown and before the start) !! Cluster is in GLOBAL MAINTENANCE mode !! --== Host 1 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : puma23.scl.lab.tlv.redhat.com Host ID : 1 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 1f0c0bf2 local_conf_timestamp : 25138 Host timestamp : 25122 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=25122 (Wed May 24 23:31:13 2017) host-id=1 score=3400 vm_conf_refresh_time=25138 (Wed May 24 23:31:29 2017) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False --== Host 2 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : puma26.scl.lab.tlv.redhat.com Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 53beb59e local_conf_timestamp : 128917 Host timestamp : 128901 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=128901 (Wed May 24 23:31:32 2017) host-id=2 score=3400 vm_conf_refresh_time=128917 (Wed May 24 23:31:48 2017) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False --== Host 3 status ==-- conf_on_shared_storage : True Status up-to-date : True Hostname : puma27.scl.lab.tlv.redhat.com Host ID : 3 Engine status : {"reason": "bad vm status", "health": "bad", "vm": "down", "detail": "down"} Score : 3400 stopped : False Local maintenance : False crc32 : 1f6fdc8c local_conf_timestamp : 128898 Host timestamp : 128883 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=128883 (Wed May 24 23:31:12 2017) host-id=3 score=3400 vm_conf_refresh_time=128898 (Wed May 24 23:31:27 2017) conf_on_shared_storage=True maintenance=False state=GlobalMaintenance stopped=False
The fix for master was done in Bug 1356425 and the cherry-pick to branch 2.1 was done in Bug 1460982, even if it is not related to this bug.
I'm getting as follows: alma03 ~]# hosted-engine --vm-start VM exists and is down, cleaning up and restarting Exception in thread Client localhost:54321 (most likely raised during interpreter shutdown):[root@alma03 ~]# Screencast being attached together with sosreports from the engine and host. What that exception? Why it is being shown?
Created attachment 1296149 [details] screencast
Created attachment 1296150 [details] sosreport from host
Created attachment 1296151 [details] engine's sosreport
Components on host: ovirt-imageio-common-1.0.0-0.el7ev.noarch mom-0.5.9-1.el7ev.noarch ovirt-imageio-daemon-1.0.0-0.el7ev.noarch sanlock-3.5.0-1.el7.x86_64 ovirt-setup-lib-1.1.3-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch qemu-kvm-rhev-2.9.0-16.el7.x86_64 ovirt-vmconsole-1.0.4-1.el7ev.noarch vdsm-4.19.21-1.el7ev.x86_64 ovirt-hosted-engine-ha-2.1.4-1.el7ev.noarch libvirt-client-3.2.0-14.el7.x86_64 ovirt-vmconsole-host-1.0.4-1.el7ev.noarch ovirt-hosted-engine-setup-2.1.3.4-1.el7ev.noarch ovirt-host-deploy-1.6.6-1.el7ev.noarch Linux version 3.10.0-691.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jun 29 10:30:04 EDT 2017 Linux 3.10.0-691.el7.x86_64 #1 SMP Thu Jun 29 10:30:04 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.4 (Maipo) On engine: rhev-guest-tools-iso-4.1-5.el7ev.noarch rhevm-doc-4.1.4-1.el7ev.noarch rhevm-dependencies-4.1.1-1.el7ev.noarch rhevm-4.1.4-0.2.el7.noarch rhevm-branding-rhev-4.1.0-2.el7ev.noarch rhevm-setup-plugins-4.1.2-1.el7ev.noarch Linux version 3.10.0-693.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jul 6 19:56:57 EDT 2017 Linux3.10.0-693.el7.x86_64 #1 SMP Thu Jul 6 19:56:57 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.4 (Maipo)
As the original bug not being reproduced, moving this bug to verified forth to https://bugzilla.redhat.com/show_bug.cgi?id=1455341#c3. Works for me with correct message now: - "VM exists and is down, cleaning up and restarting".
https://bugzilla.redhat.com/show_bug.cgi?id=1438678 explains the exception from comment #3. I've added my findings directly to 1438678.